PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

-Mcuda=3.1 enables fastmath?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
TheMatt



Joined: 06 Jul 2009
Posts: 306
Location: Greenbelt, MD

PostPosted: Wed Aug 25, 2010 7:19 am    Post subject: -Mcuda=3.1 enables fastmath? Reply with quote

In a previous topic, I noted with surprise that my PGI 10.8 install seemed to be using CUDA 2.3 by default even though I have 3.1 available:
Code:
> pgaccelinfo
CUDA Driver Version:           3010

Device Number:                 0
Device Name:                   Tesla T10 Processor
Device Revision Number:        1.3
<snip>
and am using the latest driver:
Code:
> cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  256.44  Thu Jul 29 01:22:44 PDT 2010
GCC version:  gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)

So, I decided to do some investigating and found that when I use -Mcuda=3.1,... I seem to get fastmath no matter what. For example, if I compile using -Mcuda=ptxinfo,keepgpu,keepbin,keepptx,maxregcount:64,nofma -Kieee with and without fastmath, I get timings like:
Code:
> grep Kernel Without31-*/cudafor-flxy-SPvDPorig.out
Without31-fastmath/cudafor-flxy-SPvDPorig.out:   Kernel :     67.512 +/-      1.289
Without31-Nofastmath/cudafor-flxy-SPvDPorig.out:   Kernel :    177.938 +/-      2.823
where the fastmath version is faster. But, when I use the 3.1 (-Mcuda=3.1,ptxinfo,keepgpu,keepbin,keepptx,maxregcount:64,nofma -Kieee):
Code:
> grep Kernel With31-*/cudafor-flxy-SPvDPorig.out
With31-fastmath/cudafor-flxy-SPvDPorig.out:   Kernel :     67.215 +/-      1.344
With31-Nofastmath/cudafor-flxy-SPvDPorig.out:   Kernel :     72.521 +/-      1.173

Now, I know timings aren't proof, but when I look at the differences from CPU code looking at the number of elements in an array that fail a criterion (difference from CPU value), I get:
Code:
Nofastmath: Num fail:            89  out of:          1782
fastmath: Num fail:           743  out of:          1782

With 3.1 in the -Mcuda list:
Code:
Nofastmath: Num fail:           743  out of:          1782
fastmath: Num fail:           743  out of:          1782

This seems to suggest to me that using -Mcuda=3.1 is enabling fastmath by default since I'm getting the same differences in the same place (not shown, but confirmed). Is this true? And if so, is there a "nofastmath" option for use with 3.1?

Thanks,
Matt
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5871
Location: The Portland Group Inc.

PostPosted: Fri Aug 27, 2010 2:34 pm    Post subject: Reply with quote

Hi Matt,

I asked Michael about this. There's nothing we've done but it's possible that the CUDA 3.1 header files have changed. I've added TPR#17203 and asked Michael to investigate.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group