PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

cuda-x86 documentation
Goto page Previous  1, 2
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
brentl



Joined: 20 Jul 2004
Posts: 132

PostPosted: Tue Aug 14, 2012 9:05 am    Post subject: Reply with quote

We have seen huge performance swings between optimized and non-optimized kernels as defined above. If the kernels are small, there are a large number of CUDA threads in the thread-blocks, and the kernel is not optimized, then the x86 tasks basically spend all of their time context switching. And you get lousy performance.

If the kernels are optimized (which is enabled starting with -O2 optimization, but we recommend you use -fast) then you should see performance relative to a decent OpenMP implementation of the same algorithm on that hardware.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group