PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Same worksharing type in nested loops - parallel construct

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
xray



Joined: 21 Jan 2010
Posts: 84

PostPosted: Thu Feb 21, 2013 2:02 am    Post subject: Same worksharing type in nested loops - parallel construct Reply with quote

Hi,
I can specify a "gang vector" loop schedule for both loop parts within a nested loop while using the kernels construct:
Code:
#pragma acc kernels
#pragma acc loop gang vector
        for( int j = 0; j < n; j++)
        {
#pragma acc loop gang vector
            for( int i = 0; i < m; i++ ) {...}
         }

Then the compiler uses a 2 dimensional grid and 2 dimensional blocks (that is exactly what I want):
Code:
         67, #pragma acc loop gang, vector(2) /* blockIdx.y threadIdx.y */
         70, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */


HOWEVER, if I use the parallel construct instead of kernels, I get an error message and the inner loop schedule will be ignored:
Code:
PGC-S-0155-Nested loops cannot have the same worksharing type  (file.c: 67)
[..]
67, #pragma acc loop gang, vector(256) /* blockIdx.x threadIdx.x */


Why do I get this error when it apparently workd nicely (and as expected) with the kernels construct?
How can I get 2 dimensional grids and 2 dimensional blocks with the parallel construct?
Bye, Sandra
Back to top
View user's profile
xray



Joined: 21 Jan 2010
Posts: 84

PostPosted: Thu Feb 28, 2013 4:14 am    Post subject: Reply with quote

Any news?
Back to top
View user's profile
Michael Wolfe



Joined: 19 Jan 2010
Posts: 42

PostPosted: Thu Feb 28, 2013 3:57 pm    Post subject: Reply with quote

Sandra: This is defined behavior for the parallel construct. It's more like the OpenMP loop construct (omp for or omp do). The kernels construct essentially allows tiling. For the parallel construct, we're adding an explicit tile clause for nested loops in the next OpenACC version which should give you the behavior you want.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group