PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Different answers with "-fast" and "-fastsse&

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
Forum Administrator



Joined: 09 Jul 2004
Posts: 56
Location: Lake Oswego, OR

PostPosted: Fri Jul 16, 2004 1:02 pm    Post subject: Different answers with "-fast" and "-fastsse& Reply with quote

I've compiled and run my code on a P4 running SuSE9.0 using both "-fast" and "-fastsse" but noticed that I get slightly different answers.

Example:

PROGRAM p
REAL*8 res
INTERGER i
res = 1.0
DO i = 1, 200
res = res * 0.314
END DO
WRITE(*,*) "Results: ", res, "\n"
END

pgf90 -fast a.f -> Results: 2.418261068169633E-101
pgf90 -fastsse a.f -> Results: 2.418261068169615E-101

Why the different values?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Thu Jul 22, 2004 4:42 pm    Post subject: x87 vs SSE Reply with quote

There are actually several differences between "-fast" and "-fastsse" that can result in different answers when running the same code. First off, both -fast and -fastsse are really a set of optimizations which generally give the best performance. -fast is "-O2 -Munroll=c:1 -Mnoframe -Mlre" and -fastsse is -fast plus "-Mscalarsse -Mvect=sse -Mcache_align -Mflushz".

The biggest difference are the "-Mscalarsse -Mvect=sse" flags which tells the compiler to generate SSE code, while -fast will generate x87 code. SSE is generally faster since its architecture is faster and it can perform multiple floating point calculations per clock cycle. While it's harder to generate optimized code for x87 and x87 only performs one 80-bit calculation per cycle.

One reason why your seeing precision differences is because for double precision floating point values, SSE uses a 64-bit register while x87 uses a 80-bit register. Although values are truncated to 64-bits when stored to memory, a good compiler will try and keep values in the x87 register. As more and more calculations are done, the more impact the extra bits make. Also, SSE code will use different algorithms which can result in slightly different results.

In the FAQ section there a more detailed guide on precision issuses on an x86 systems that you might want to read. (See [url]/support/execute.htm#precision[/url]).

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group