| View previous topic :: View next topic |
| Author |
Message |
alechand
Joined: 14 May 2012 Posts: 21
|
Posted: Fri May 24, 2013 4:58 pm Post subject: |
|
|
Thanks Mat,
i sent you an email. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4995 Location: The Portland Group Inc.
|
Posted: Tue May 28, 2013 8:54 am Post subject: |
|
|
Hi alechand,
It appears to be the difference is being caused by accumulated rounding error when using fused-multiply-add (FMA) instructions. The same issue can be seen on the CPU when using higher optimizations. Try adding the flag "-ta=nvidia,nofma" to see if this helps.
- Mat |
|
| Back to top |
|
 |
alechand
Joined: 14 May 2012 Posts: 21
|
Posted: Tue May 28, 2013 11:01 am Post subject: |
|
|
Mat,
unfortunately, this did not help.
I was thinking, if the results agree "exactly" using TIME=100000,
how can be accumulated errors ?
Do you have other idea?
I really appreciate your attention. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4995 Location: The Portland Group Inc.
|
Posted: Tue May 28, 2013 1:45 pm Post subject: |
|
|
We were able to trace this down. It looks like that at least a few x(i) and xold(j) values are within a small margin of error difference. With slight changes in precision in these cases, the dominant value may flip-flop leading to divergent values of c1 and c2. This then has a cascading effect which leads to the eventual wrong answer.
- Mat |
|
| Back to top |
|
 |
|