PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

private OpenACC clause on loop, kernels, and parallel constr
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Youngsung_



Joined: 04 Dec 2012
Posts: 4

PostPosted: Tue Dec 04, 2012 11:04 am    Post subject: private OpenACC clause on loop, kernels, and parallel constr Reply with quote

Hi,

After finding out that private clause in loop construct caused performance penalty, I had a question regarding PGI's private clause interpretation.

According to OpenACC standard v1.0, private clause is allowed on parallel construct and loop construct, but not on kernels construct. And if private clause is on loop construct, the variables in private clause are supposed to be created at every iteration. Here is my question regarding "kernels" and "private" usage: If I want to declare explicitly a list of variable as private within a gang, but do not want to create per every iteration with kernels construct, what is the correct way to use those constructs and clause?

Thanks,

Youngsung
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Tue Dec 04, 2012 11:51 am    Post subject: Reply with quote

Hi Youngsung,

Do you mean that you want to create a variable that is private to a gang but shared amongst the vectors in a gang?
Code:

!$acc kernels
!$acc loop gang private(A)
do i=1, N
!$acc loop vector
  do j=1,M
 ...


Here A is private to each iteration of the "i" loop, but shared amongst the iterations of the "j" loop (i.e.the vectors).

- Mat
Back to top
View user's profile
Youngsung_



Joined: 04 Dec 2012
Posts: 4

PostPosted: Tue Dec 04, 2012 12:24 pm    Post subject: Reply with quote

Hi Mat,

Thanks for your kind explanations. It is good to know to put private clause on loop gang construct for vectors to share variables.

However, my situation is a bit more complicated. Please see my code below:

1 !$acc kernels
2 !$acc loop gang(ngangs) vector(neblk)
3 do ie=1,nelem
4 !$acc loop vector(npts) private(s1,s2,i,j,k,l)
5 do ii=1,npts
6 ... computation using private variables and others

On line #4, I put private and it caused performance penalty.
On line #2, I have gang as well as vector. When I move private clause from line #4 to line #2, I saw approx. 10% performance improvement but had different computation result from previous one.

Actually, when I completely deleted private clause from source code, I was able to get the same result as well as 2X speed-ups. So, I am still confusing how PGI handles the private clause.

Thanks,

Youngsung
mkcolg wrote:
Hi Youngsung,

Do you mean that you want to create a variable that is private to a gang but shared amongst the vectors in a gang?
Code:

!$acc kernels
!$acc loop gang private(A)
do i=1, N
!$acc loop vector
  do j=1,M
 ...


Here A is private to each iteration of the "i" loop, but shared amongst the iterations of the "j" loop (i.e.the vectors).

- Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Tue Dec 04, 2012 12:42 pm    Post subject: Reply with quote

Quote:
private(s1,s2,i,j,k,l)
These all look like scalars? By default scalars are made local to the generated kernel. This makes them private and has the added benefit that these variables are more likely to be put into a register.

When you add a scalar to a private clause, you are creating an array of these scalars in global memory, where each loop iteration has it's own element (gang or vector). Since the variable is now in global memory, your code slows down.

I've talked with our compiler engineers about this and they agree that we need to rework this implementation. Essentially we should ignore scalars in a private clause when they are placed on a vector only loop and instead always make them local to the kernel. For a private on a gang loop, we should be using shared memory instead of global.

We'll probably make this change once the proposed OpenACC 2.0 "default(none)" clause is implemented. Until then, the recommendation is not put scalars in private clauses unless absolutely necessary.

Hope this helps,
Mat
Back to top
View user's profile
Youngsung_



Joined: 04 Dec 2012
Posts: 4

PostPosted: Tue Dec 04, 2012 12:53 pm    Post subject: Reply with quote

I've got clear idea now how it works!!! Thanks a lot, Mat.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group