PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Loop unrolling (PGI 5.1 and 5.2: pgf77)
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
mkrech



Joined: 15 Oct 2004
Posts: 11

PostPosted: Tue May 03, 2005 4:34 am    Post subject: Loop unrolling (PGI 5.1 and 5.2: pgf77) Reply with quote

Hi forum,

One of our users experiences subltle differences in the results of his ("bulky'"
density functional solver) code when loop unrolling is activated during compilation.
Some initial results differ by 1.0e-5 leading to sizeable differences in the final
(electron density) results. The code contains about 700 loops, so there is little
chance to pinpoint the one(s) that cause the trouble. We haven't tried 6.0 yet,
but my understanding of loop unrolling is, that the order of the statements
arranged inside the loop and the order in which they are executed are not
changed. To this end, I would expect differences to be at most within the
computational accuracy (double precision), however, the observed deviations
are far bigger.

Has anyone noticed similar effects with loop unrolling?

Many thanks,
Michael
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Tue May 03, 2005 11:52 am    Post subject: Reply with quote

Hi Michael,

What type of system is this being run on and what flags are being used with each run? If your on a 32-bit system, this sounds more like a x87 precission issue rather than unrolling (See http://www.pgroup.com/support/execute.htm#precision)

- Mat
Back to top
View user's profile
mkrech



Joined: 15 Oct 2004
Posts: 11

PostPosted: Wed May 04, 2005 2:41 am    Post subject: Reply with quote

Hi Mat,

The system is an AMD Opteron and the compiler flags are -fast or all options
in -fast except loop unrolling. I checked this with one of my Monte-Carlo codes and
5.2 but didn't see any difference even wiithin higher precision than mentioned above
(6 to 8 significant digits). I checked it again with all optimizations turned off and
found no difference.

I also checked 5.2 against 6.0 with and without loop unrolling on a 32-bit Xeon and
again did not find any difference. The last thing to do is a check of his code with
6.0 on an AMD Opteron.

By the way, my tests started a Monte-Carlo from always the same random number
seed, so from the sequence of operations the results are predetermined digit by digit.
If something in the MC code is upset by optimization, some results usually change quite
drasticallly in the course of the simulation due to some Monte-Carlo updates going
another way.

Best regards,
Michael
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Wed May 04, 2005 3:18 pm    Post subject: Reply with quote

Hi Michael,

"-Munroll" shouldn't have any effect on the order of operations. It can cause values to be stored in registers longer, but this would only effect precision when using the x87 FPU.

When you say "all options in -fast except loop unrolling" what exact flags are being used? I ask because the exact meaning of "-fast" can change and if the user is looking at an older manual then he/she might have missed a flag. (Note to find the most up-to date meaning for "-fast", execute "pgf90 -help -fast" from the command line.) Specifically, I'm wondering if "-Mlre" was included. "-Mlre" performs loop-carried redundancy elimination and can have a impact on the loop's operations. If I'm correct, have the user compile with "-fast -Mnolre" to turn off LRE.

- Mat
Back to top
View user's profile
mkrech



Joined: 15 Oct 2004
Posts: 11

PostPosted: Tue May 17, 2005 8:08 am    Post subject: Reply with quote

Hi Mat,

Sorry for the long delay, I was on a workshop and in the meantime tests with 6.0
have been made on our Opterons. The results may be a relief for PGI, because
the program crashes now depending on the optimizations used. As I could not
reproduce any of the effects of loop unrollling with my own test codes, this indicates
a bug in the program that causes the problems. In fact, in an old part of the Fortran
code there is a subroutine to which all arguments are passed with a reference to an
array which holds all arguments of various types via 'equvalence' statements. The
cause of the trouble may be the optimization dependent placement of variables in
memory about which the subroutine most likely contains invalid assumptions.

All I could do here was wishing the (frustrated) user good luck in debugging the
'dirty' Fortran code of the subroutine.

Best regards,
Michael
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group