PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

about gang and worker

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Teslalady



Joined: 16 Mar 2012
Posts: 75

PostPosted: Mon Nov 19, 2012 3:46 am    Post subject: about gang and worker Reply with quote

The code as below:
!$acc kernels
DO j=jtf,jtg
DO k=kts,kte
DO i=its,ite
work(i,k,j)=dc05*(u0(i,k,j)+u0(i-1,k,j))
ENDDO
ENDDO
ENDDO


the compiler information is as :
3420, Loop is parallelizable
3421, Loop is parallelizable
3422, Loop is parallelizable
Accelerator kernel generated
3420, Cached references to size [(x+1)x(y)] block of 'u0'
3421, !$acc loop gang, vector(4) ! blockidx%y threadidx%y
3422, !$acc loop gang, vector(64) ! blockidx%x threadidx%x
CC 1.0 : 26 registers; 112 shared, 8 constant, 0 local memory bytes
CC 2.0 : 22 registers; 0 shared, 124 constant, 0 local memory bytes


My questions are:
1.Is the parallel of J-orientation gang-level? Why the compiler did not acclocate the parallel of worker-level ?

2.from the 3420 line, Cached references to size [(x+1)x(y)] block of 'u0' ,what's that meaning? Is this result created by the directive of $acc cache (u0)?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Nov 19, 2012 9:56 am    Post subject: Reply with quote

Hi Telsalady,

From the compiler feedback messages, it appears to me that the compiler has made the schedule out of the k and i loops and put the j loop in the kernel. The reasoning being that this loop doesn't have much work so by serializing j, you can increase the amount of work each kernel does. Especially since the code utilizes cached memory, this will help in increasing the computational intensity.

While not guaranteed, the compiler usually does a good job at finding an optimal schedule. Though, you can use the loop directives to override the compiler's default schedule if you want to try others.

Quote:
Why the compiler did not acclocate the parallel of worker-level ?
The "worker" construct on an Nvidia device corresponds to the warp size. The warp size is fixed at 32 so can't be changed.

Quote:
2.from the 3420 line, Cached references to size [(x+1)x(y)] block of 'u0' ,what's that meaning? Is this result created by the directive of $acc cache (u0)?
This is the compiler auto-detecting where to apply caching. You could used the cache directive, but the PGI compiler will automatically find opportunities.

- Mat
Back to top
View user's profile
Teslalady



Joined: 16 Mar 2012
Posts: 75

PostPosted: Tue Nov 20, 2012 9:12 am    Post subject: Reply with quote

Thanks Mat!

I have some confuse about Copy clause. Is Copy clause that copy date from host to the global memory of GPU?

Thanks again
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Wed Nov 21, 2012 9:26 am    Post subject: Reply with quote

The "copy" clause allocates memory on the device and then copies the data to the devices global memory at the start of the region (data or compute). At the end of the region, the data is copied back to the host and the memory is deallocated on the device.

The "copyin" clause copies the data to the device, but does not copy it back to the host. While the "copyout" clause only copies the data back to the host but does not copy it to the device.

The "create" clause only allocates and deallocated the device memory, but does not perform any copies.

The "update" directive can be use within a data region to copy data to/from the device at specific points in your program.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group