PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Loop optimization question

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
szczelba



Joined: 29 Jun 2010
Posts: 26

PostPosted: Tue Apr 05, 2011 7:14 am    Post subject: Loop optimization question Reply with quote

Hello,

I'm trying to understand the concepts of effective loop parallelization in PGI Accelerator.
I read about "parallel" and "vector" directives. If I understand right, "parallel" clause means that the iterations will be executed simultaneously on the accelerator. Number of concurrently executed iterations cannot be greater than number of cores on the GPU, right?
The "vector" clause means that the iterations will be executed simultaneously but with synchronization. So, there will be some synchro across iterations. Shouldn't it slow down the computations a bit?
What is the parameter of the "vector" clause? It determines "how many iterations are in a vector". But what does it mean? Can it be larger than number of GPU cores?

Then, when I try to accelerate a simple loop like:

Code:
!$acc region do parallel
      do i=1,n
          a[i] = a[i]+2
      enddo
!$acc end region


(assuming that a[] is initialized earlier) I got as a result:
Quote:
"Non-stride-1 access for array a"

Isn't it a stride-1 access?

I've also tested an example code that I've found:

Code:

c Simple Loop Nest with Poor Cache Use:
!$acc region do parallel
do i=1,n
  do j=1,n
    a(i,j) = b(i,j)
  enddo
enddo
!$acc end region

c Reversed Loop Nest to Achieve Stride-1 Access
!$acc region do parallel
do j=1,n
  do i=1,n
    a(i,j) = b(i,j)
  enddo
enddo
!$acc end region


There is also the same message about "non-stride-1 access" in case of first and second loops. I see than when I don't put the "parallel" directive, the compiler automatically adds "parallel, vector(...)". Why?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Tue Apr 05, 2011 9:22 am    Post subject: Reply with quote

Hi szczelba,

You're understanding is a bit off about 'parallel' and 'vector' as applied to an NVIDIA GPU. Parallel corresponds to the Thread Block which are scheduled on a Streaming Multiprocessor while Vector corresponds to the Threads within a Block scheduled on the individual cores. This is a good primer on the NVIDIA threading model and should help.

http://www.pgroup.com/lit/articles/insider/v2n1a5.htm

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group