PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

good values for width in vector and parallel directives

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Mon Nov 23, 2009 7:49 pm    Post subject: good values for width in vector and parallel directives Reply with quote

Hi all,
When we want to explicitly tell how the compiler should compile our code to run on the Accelerator, the parallel (width) and vector(width) are of my concern.

Looking at the "width" value automatically determined by the compiler, I see they can get different values like 8, 16, 128 or sometimes 256. I'm not sure how we determine a good value for "width" if we want to explicitly select it by ourself.

Example:
Code:
!$acc do parallel (16)
   do i=1, 100
     ....
   enddo


Tuan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Tue Nov 24, 2009 10:38 am    Post subject: Reply with quote

Hi Tuan,

The parallel and vector clauses correspond to how your compute kernels are scheduled on the accelerator. "parallel" is the number of multi-processor threads and vector is the number of SIMD (vector) threads.

In general, the state of the art way to choose the optimal schedule for a GPU is to try all options. As Micheal Wolfe likes to say, for you students, there is a PHD thesis here.

The compiler will generate a default schedule but there is no guarantee that it's optimal. Best thing to do is experiment and see if you can improve the performance.

- Mat
Back to top
View user's profile
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Tue Nov 24, 2009 11:59 am    Post subject: Reply with quote

mkcolg wrote:
Hi Tuan,

The parallel and vector clauses correspond to how your compute kernels are scheduled on the accelerator. "parallel" is the number of multi-processor threads and vector is the number of SIMD (vector) threads.

In general, the state of the art way to choose the optimal schedule for a GPU is to try all options. As Micheal Wolfe likes to say, for you students, there is a PHD thesis here.

The compiler will generate a default schedule but there is no guarantee that it's optimal. Best thing to do is experiment and see if you can improve the performance.

- Mat


Thanks a lot, Mat.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group