PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Unstructured: Reading vs. Writing

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
elephant



Joined: 24 Feb 2011
Posts: 22

PostPosted: Wed Aug 17, 2011 4:53 am    Post subject: Unstructured: Reading vs. Writing Reply with quote

I have a question about reading and writing memory in an unstructured mannor in a accelerator region:

I noticed that if I read the memory of an array (A3_GPU) in a loop unordered, than the loop is still parallelizable and the performance is not that bad:
Q1 will have unordered values, eg:

i=1 : Q1=2345
i=2 : Q1=12
i=3 : Q1=18474
and so on....
Code:

!reading unstructured
!$acc region
         do i = 1,100000     
             Q1 = A1_GPU(KP,1)
             A2_GPU(i,3) = A3_GPU(Q1,1)             
         end do                   
!$acc end region

But when I want to write an array (B2_GPU) with an unstructured pattern, than the compiler forces the loop to execute sequentially on the device (!$acc do sec), which gives me very bad performance.
The loop looks like the following, and K1 is unordered, eg:

i=1 : K1=2345
i=2 : K1=12
i=3 : K1=18474
Code:

!writing unstructured
!$acc region
         do i = 1,100000     
             K1 = B1_GPU(KP,1)
             B2_GPU(K1,3) = B3_GPU(i,1)             
         end do                   
!$acc end region

Is there any workaround? Or just a possibility to tune such a loop?
What does the "width mean" if I use the directive: !$acc do sec [(width)]?
Copying the data to the host and executing the loop on the CPU and copying it back to the device is not an option, this would take more tme I guess.

Thank you very much![/quote]
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6211
Location: The Portland Group Inc.

PostPosted: Wed Aug 17, 2011 8:08 am    Post subject: Reply with quote

Hi elephant,
Quote:

But when I want to write an array (B2_GPU) with an unstructured pattern, than the compiler forces the loop to execute sequentially on the device (!$acc do sec),
For a computed index, the compiler has no way of knowing at compiling time if all the values of K1 are unique. Hence, it must assume the worst case that all values of K1 are the same and therefore the loop is not safe to parallelize.

Quote:
Is there any workaround?
Yes, the "independent" clause is your way of asserting to the compiler that all index values are independent of each other and it's ok to parallelize.

Code:
!$acc region
!$acc do independent
         do i = 1,100000     
             K1 = B1_GPU(KP,1)
             B2_GPU(K1,3) = B3_GPU(i,1)             
         end do                   
!$acc end region

Quote:

What does the "width mean" if I use the directive: !$acc do sec [(width)]?
It's the size the compiler has strip mined the loop. Strip mining is when a small inner loop is created to work on small portions of the outer loop. This allows for variables to be stored in cache, or in the GPU case, shared memory.

Hope this helps,
Mat
Back to top
View user's profile
elephant



Joined: 24 Feb 2011
Posts: 22

PostPosted: Wed Aug 17, 2011 8:56 am    Post subject: Reply with quote

Very good! Thank you! Can't wait to implement this "independent" clause and see the performance gain!!!!
Excellent...
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group