PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

fail to converged when binary compiled by latest release
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
jasonshih



Joined: 03 Aug 2004
Posts: 32

PostPosted: Mon Oct 18, 2004 10:44 am    Post subject: fail to converged when binary compiled by latest release Reply with quote

Hi,

while compiling molecule simulation code on my SGI itanium box, error encounter when running simple test case with the executable. The OS of SGI IA64 machine is Gentoo, with version of glibc 3.3.4. We've tried couple of combination as well as various of PGI compiler, and figure out if basis set including f channel, integration seems to be diverged somehow. Furthermore, convert to older version also not help solving the problem. Compiler option adopted on IA64 is: -O2 -Mextend

While turning off the optimization, program works as usual. It fails all the time when optimization is switch on (actually, job terminated normall, but the quantity is serveral times than the those obtained on IA32 arch):


if optimization is switch off on IA64 as well as switch on but compiled on IA32:
-------------------------------------------------------------------------------------
Dipole Moment (Debye)
X 2.0329 Y -0.5028 Z 0.0000
Tot 2.0942
Quadrupole Moments (Debye-Ang)
XX -10.8505 XY -0.9292 YY -11.6864
XZ 0.0000 YZ 0.0000 ZZ -10.7749
Octapole Moments (Debye-Ang^2)
XXX -0.6157 XXY -1.5426 XYY -0.9108
.....
-------------------------------------------------------------------------------------

optimization is swtich on on IA64:
-------------------------------------------------------------------------------------
Dipole Moment (Debye)
X -23.5496 Y 2.8812 Z 0.0000
Tot 23.7252
Quadrupole Moments (Debye-Ang)
XX -66.0876 XY 12.3932 YY -42.7407
XZ 0.0000 YZ 0.0000 ZZ -33.2484
Octapole Moments (Debye-Ang^2)
XXX -116.5504 XXY 28.4596 XYY -31.6246
....
-------------------------------------------------------------------------------------


comparison of the makefile:
-------------------------------------------------------------------------------------
[jason@localhost AMD-fail]$ diff md.make.linux.opteron md.make.linux.opteron-fortran-null
181,182c181,182
< FOPTIMIZE = -O2
< #FOPTIMIZE =
---
> #FOPTIMIZE = -O2
> FOPTIMIZE =
-------------------------------------------------------------------------------------

any idea? Thanks in advance.

BR,
J
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Mon Oct 18, 2004 11:29 am    Post subject: Itanium Not Supported Reply with quote

Hi J,


We don't support IA64 (Itanium) so I'm bit supprised you we able to get anything to run on the SGI machine. Actually, I didn't think IA64 allowed you to run IA32 binaries.

I'm wondering if you really mean AMD64 (Opteron) since you diff two files with opteron in their names.

- Mat
Back to top
View user's profile
jasonshih



Joined: 03 Aug 2004
Posts: 32

PostPosted: Mon Oct 18, 2004 5:20 pm    Post subject: Reply with quote

Mat, sorry for the my mistake. It's AMD Opteron intstead. The output carried out from Ia32 is compiled and executed on other machine. :-)

BR,
J
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Mon Oct 18, 2004 8:10 pm    Post subject: Reply with quote

No problem. Let see if we can figure out why your getting different answers. My best guess is it the difference between how x87 and SSE calculate floating point values. x87 uses 80-bits of precision while SSE uses 64-bits. Although all double precision values are stored as 64-bits, at -O2 pgf90 will accumulate values in the x87 registers. So as more calculations are done, the more the extra bits matter.

To test this theory, try compiling and running with the following flags on your IA32 (Note I'm assuming your IA32 is a pentium 4 or equivlent), "-O2 -pc 64" and "-O2 -Mscalarsse". Does the output now match the AMD64 bit machine?

AMD64 only has SSE and should match the two flags listed. "-O2 -pc 64" uses the x87 registers but forces the compiler to store the values to memory with each iteration. "-Mscalarsse" tells the compiler to use SSE instead of x87.

Of course the flaw here is that your answers are very different. You'd expect that the difference to be small since 64 to 80 bit precision only effects very small values. However, I've seen programs where such values are used as divisors and can cause greater deviation of the end results.

Let me know how this works!

- Mat
Back to top
View user's profile
jasonshih



Joined: 03 Aug 2004
Posts: 32

PostPosted: Sun Oct 24, 2004 9:19 am    Post subject: Reply with quote

Hi Mat,

sorry for the late, it tool couple of days to finish compiling all source as well as basis on another p4 machine. However, all test cases show the same result with two different compiler arguments (-O2 -pc 64 and -O2 -Mscalarsse) you suggested instead of unconverged problem encountered in SGI Opteron. Any furhter comment?

BR,
J
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group