PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Compiling on AMD Opteron: Loop not vectorized: not countable

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
rhavlin



Joined: 30 Aug 2004
Posts: 2

PostPosted: Tue Aug 31, 2004 8:46 am    Post subject: Compiling on AMD Opteron: Loop not vectorized: not countable Reply with quote

I am attempting to compile gaussian03 and pgf77 (5.1-3 linux86-64) gives me the message:
"Loop not vectorized: not countable"

while when I compile the same source for the 32-bit arch it seems to vectorize fine. See details below:

" 9, Unrolling inner loop 4 times
Generated prefetch instructions for 2 loads and stores
Timing stats:
vectorize 16 millisecs 100%
Total time 16 millisecs"


vs. (with linux86-64)

" 9, Loop not vectorized: not countable
Timing stats:
Total time 0 millisecs"


Perhaps there is no real problem here, but it looks like there might be.

Thanks,
Bob
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Wed Sep 01, 2004 3:13 pm    Post subject: Nothing comes to mind Reply with quote

Sorry for getting back to you so late, but I've been pondering this. Unfortunately, nothing solid comes to mind.

Some guesses might be:

Your using slightly different options for 64-bits which gives a different behavior.

The 64-bit source has been ported from 32-bits, hence is slightly different. Are the Define (-D) flags the same?

Try compiling the file on a 64-bit system using the standard flags plus "-tp k8-64". Recompile again using "-tp k8-32". Do you still see a difference? The "-tp" option tells the compiler the target architecture. k8-64 generates 64-bit code for Opteron while k8-32 generates 32-bit code.

- Mat
Back to top
View user's profile
rhavlin



Joined: 30 Aug 2004
Posts: 2

PostPosted: Wed Sep 01, 2004 7:28 pm    Post subject: More info on the Opteron Compilation Problem with Gaussian03 Reply with quote

Thanks for the suggestions Mat! Below I provide more information as per your suggestions. I hope it helps!

1) The reason I even noticed this was comparing two different compilations at k7 and k8-64:

k7:
pgf77 -mp -O2 -tp k7 -Mreentrant -Mrecursive -Mnosave -Minfo -Mneginfo -time -fast -Munroll -Mvect=assoc,recog,cachesize:262144,prefetch -c aabs.f
aabs:
9, Unrolling inner loop 4 times
Generated prefetch instructions for 2 loads and stores
Timing stats:
Total time 0 millisecs


opteron (k8-64):
pgf77 -i8 '-mcmodel=medium' -mp -O2 -tp k8-64 -Mreentrant -Mrecursive -Mnosave -Minfo -Mneginfo -time -fast -Munroll -Mvect=assoc,recog,cachesize:1048576 -c aabs.f
aabs:
9, Loop not vectorized: not countable
Timing stats:
init 16 millisecs 100%
Total time 16 millisecs



2) As you suggested, I tried the same compile line while just changing -tp from "k8-64" to "k8-32" and it no longer gives the "not countable" error and appears to unroll the loops.

Hmm.. Not sure where to go from here??
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Thu Sep 02, 2004 10:58 am    Post subject: 2 Possible Causes Reply with quote

I see two possible reasons. One might be because of the code generator being used and the second is because of "-i8".

We actually use two separate 32-bit code generators, one for older x87 systems and a second for SSE2 enabled systems. k8-64 systems only use the SSE code generator. In your example, the k7 system is using the old CG and the k8-64 is using the new CG. To test this theory, you'd need to compile with and without "-Mscalarsse" on a k8 or p4 system. "-Mscalarsse" tells the compiler to use the new CG. To determine which is actually being used, compile with "-v" and see which directory pgftn is being pulled from. "../linux86/5.1/bin/p3/pgftn" is the old and "linux86/5.1/bin/newcg/pgftn" is the new.

The second possiblity is because of "-i8". With the 5.1 and 5.0 compilers we were missing some optimiziation opportunities when "-i8" was present. This might be one of them. We greatly enhanced our "-i8" optimizations with 5.2, so you might want to try the newer release. You can also try, as an experiment, compiling without "-i8". Of course, leave "-i8" for your actual build since you might need it for C and Fortran interoperability.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group