PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Computing multiple elements per thread in OpenACC

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
xray



Joined: 21 Jan 2010
Posts: 85

PostPosted: Wed Feb 20, 2013 7:27 am    Post subject: Computing multiple elements per thread in OpenACC Reply with quote

Hi,
assume we have the following code:
Code:
#pragma acc kernels
#pragma acc loop gang(16) vector(32)
for (int i=0; i<2048; i++) {
  // do something with array[i]
}

With PGI Compiler 12.9, this meant that we created a grid of size 16 and blocks of size 32 so that each CUDA thread would execute 4 elements.
However, with PGI Compiler 13.1 this is not possible anymore. If I denote vector and gang size, then the gang size will be ignored during execution (however, the compiler feedback will tell me that is uses 16 gangs). With 13.1, the compiler automatically executes the loop with a grid size of 64 (and vector size 32).
Is this a bug or intended? If the latter, why?
Kind regards, Sandra
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Wed Feb 20, 2013 4:34 pm    Post subject: Reply with quote

Hi Sandra,

No, this doesn't look correct. I've opened up a problem report (TPR#19149) and sent it to our engineers for further investigation.

Thanks!
Mat
Back to top
View user's profile
xray



Joined: 21 Jan 2010
Posts: 85

PostPosted: Thu Feb 21, 2013 2:49 am    Post subject: Reply with quote

Thanks.
Just one addition: If I use a gang schedule for an outer loop, the vector schedule for the inner one of a loop nest and specify both sizes, then the specified size of the gang loop will also be ignored:
Code:
#pragma acc parallel vector_length(64) num_gangs(128)
#pragma acc loop gang
        for( int j = 0; j < n; j++)
        {
#pragma acc loop vector
            for( int i = 0; i < m; i++ ) {..}
        }

The output of ACC_NOTIFY shows that block=64, but grid=8190 (which is n in my case).
Sandra
Back to top
View user's profile
jtull



Joined: 30 Jun 2004
Posts: 445

PostPosted: Fri May 17, 2013 4:48 pm    Post subject: Reply with quote

Sandra,

TPR 19149 has been fixed in the current 13.5 release.

dave
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group