PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

Accuracy, Inlining, and O Levels

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
TheMatt



Joined: 06 Jul 2009
Posts: 331
Location: Greenbelt, MD

PostPosted: Wed Nov 25, 2009 11:45 am    Post subject: Accuracy, Inlining, and O Levels Reply with quote

I'm currently trying to inline (and soon reloop) a large piece of code that, eventually, will be put on GPU accelerators. But, until then, I am working solely on CPUs. My current attempt at this inlining--which requires inlining about 20 or so subroutines--has hit a possible roadblock: I seem to have lost accuracy.

To explain this, what I did was build a driver that runs two sets of calculations. It first runs the code in its full-of-calls, non-inlined, original glory. The various output arrays are then put into control arrays:
Code:
flc_control=flc

I then reinitialize everything and run the new inlined code, and then make an array that contains the absolute diffs between the new and old results:
Code:
flc_diff=abs(flc-flc_control)

Finally, I check to see if the resultant difference array is within a threshold value (in this case 1.e-08):
Code:
if (maxval(flc_diff) > thresh) then
   write (output_unit,*) "Failure with flc!"
   write (output_unit,*) maxval(flc_diff)
   write (output_unit,*) maxloc(flc_diff)
endif

What I've found is that using compile options of:
Code:
FOPTS = -O0 -Kieee -r4 -Mextend -Mpreprocess -Ktrap=fp
I'm getting outputs of:
Code:
 Failure with flc!
   2.3841858E-07
         1493           14
with this value being the largest absolute difference I've seen.

This problem only cropped up after I inlined the very last subroutine call. Before this, I was getting under-threshold accuracy with even "-fast -Kieee". I am certain I'm not stepping on any variables (some renaming was needed, but I've confirmed the renamed variables work in the non-inlined case with no loss of accuracy).

I suppose my question is, should I expect better accuracy than this? I don't know how inlining code would cause more roundoff error than *not* inlining. I was expecting bit-identical results from just inlining at -O0 before I started changing the loop order.

Is there any way to get even less optimized and more accurate than "-O0 -Kieee"? Or, did I just coincidentally gather enough roundoff error with this last inline such that it makes a difference?

Matt
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6630
Location: The Portland Group Inc.

PostPosted: Wed Nov 25, 2009 4:42 pm    Post subject: Reply with quote

Hi Matt,

"-O0 -Kieee" means that the compiler is doing no optimization and using strict IEEE 754 conforming intrinsics. In other words, it's most likely not an issue with precision. I would go back a recheck the code you inlined and look for possible coding errors.

Hope this helps,
Mat
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 331
Location: Greenbelt, MD

PostPosted: Wed Nov 25, 2009 5:21 pm    Post subject: Reply with quote

mkcolg wrote:
Hi Matt,

"-O0 -Kieee" means that the compiler is doing no optimization and using strict IEEE 754 conforming intrinsics. In other words, it's most likely not an issue with precision. I would go back a recheck the code you inlined and look for possible coding errors.
Hmm. I'll see what I can do, but you might be getting an email soon. I just cannot see how this isn't working.

I did attempt to add a previously re-looped version of this subroutine instead of the straight cut-and-paste, just to see if that changed anything...and it core dumped. No error message or anything, just a straight core dump!
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 331
Location: Greenbelt, MD

PostPosted: Mon Nov 30, 2009 8:17 am    Post subject: Reply with quote

A-yup. Somehow some temporary array was being passed from one inlined section to another. I did the "make old engineers cry" method of duplicating every single temporary array with different names and it now does work to 1e-12 precision even with "-fast -Kieee". (Whether or not real*4 can be that precise, well...)

I'm still not sure about the core dump, but I'm shelving that for now as I have some massive relooping coming up.

Thanks again, Mat, and apologies for wasting some SQL database entries!
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6630
Location: The Portland Group Inc.

PostPosted: Mon Nov 30, 2009 11:49 am    Post subject: Reply with quote

Not a problem. I'd much rather have you ask, then not. I just glad you were able to find the error.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group