PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

update host async
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Alexey A. Romanenko



Joined: 17 Feb 2012
Posts: 36

PostPosted: Thu Aug 02, 2012 3:42 am    Post subject: update host async Reply with quote

Hi all!

Here is a test code
Code:
subroutine zz(a, b)
INTEGER, PARAMETER :: Nvec = 10000, Nchunks = 10000
REAL*8 :: a(*), b(*)
!$acc data create(a(1:Nvec*Nchunks),b(1:Nvec*Nchunks))
DO j = 0,Nchunks-1
k=Nchunks*j+1
l=k+Nchunks-1
!$acc update device(a(k:l)) async(j)
!$acc parallel loop async(j)
DO i = 1,Nvec
b(k+i-1) = SQRT(a(k+i-1)*2d0)
ENDDO
!$acc update host(b(k:l)) async(j)
ENDDO
!$acc wait
!$acc end data

end subroutine

Program main
INTEGER, PARAMETER :: Nvec = 10000, Nchunks = 10000
REAL*8 :: a(1:Nvec*Nchunks), b(1:Nvec*Nchunks)
DO j = 0,Nchunks*Nvec-1
a(j+1)=j
enddo
call zz(a,b)
write(*,*) "sum = ",SUM(b)
end


profiler shows that "update host async" directive produce synchronous call
Code:
method,gputime,cputime,occupancy
method=[ memcpyHtoDasync ] gputime=[ 52.672 ] cputime=[ 7.000 ]
method=[ zz_10_gpu ] gputime=[ 5.856 ] cputime=[ 7.000 ] occupancy=[ 0.667 ]
method=[ memcpyDtoH ] gputime=[ 49.184 ] cputime=[ 120.000 ]
method=[ memcpyHtoDasync ] gputime=[ 53.856 ] cputime=[ 6.000 ]
method=[ zz_10_gpu ] gputime=[ 5.248 ] cputime=[ 7.000 ] occupancy=[ 0.667 ]
method=[ memcpyDtoH ] gputime=[ 49.143 ] cputime=[ 121.000 ]

Is it my error or "async" was not implemented yet
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6215
Location: The Portland Group Inc.

PostPosted: Thu Aug 02, 2012 10:03 am    Post subject: Reply with quote

Hi Alexey,

What's happening is that a helper thread is being spawned to handle the device to host transfer. So while we don't call CUDA's async memory routine, the thread, and thus the copy, is being run asynchronously to the main host thread.

Hope this helps,
Mat
Back to top
View user's profile
Alexey A. Romanenko



Joined: 17 Feb 2012
Posts: 36

PostPosted: Sun Aug 05, 2012 10:24 pm    Post subject: Reply with quote

Thanks Mat!

Are you going to implement it in the next release?

Alexey
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6215
Location: The Portland Group Inc.

PostPosted: Mon Aug 06, 2012 10:54 am    Post subject: Reply with quote

Quote:
Are you going to implement it in the next release?
No. The problem is that there is no call backs from the device so when doing a device to host async transfer, there isn't a way to know that the data transfer is complete. Hence, the use of the helper thread.

- Mat
Back to top
View user's profile
Alexey A. Romanenko



Joined: 17 Feb 2012
Posts: 36

PostPosted: Tue Aug 07, 2012 10:27 pm    Post subject: Reply with quote

why!

You have cuda on lower layer. So use
Code:

cudaError_t    cudaStreamCreate (cudaStream_t *pStream)
cudaError_t    cudaStreamDestroy (cudaStream_t stream)
cudaError_t    cudaStreamQuery (cudaStream_t stream)
cudaError_t    cudaStreamSynchronize (cudaStream_t stream)
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group