|
| View previous topic :: View next topic |
| Author |
Message |
Alexey A. Romanenko
Joined: 17 Feb 2012 Posts: 31
|
Posted: Thu Aug 02, 2012 3:42 am Post subject: update host async |
|
|
Hi all!
Here is a test code
| Code: | subroutine zz(a, b)
INTEGER, PARAMETER :: Nvec = 10000, Nchunks = 10000
REAL*8 :: a(*), b(*)
!$acc data create(a(1:Nvec*Nchunks),b(1:Nvec*Nchunks))
DO j = 0,Nchunks-1
k=Nchunks*j+1
l=k+Nchunks-1
!$acc update device(a(k:l)) async(j)
!$acc parallel loop async(j)
DO i = 1,Nvec
b(k+i-1) = SQRT(a(k+i-1)*2d0)
ENDDO
!$acc update host(b(k:l)) async(j)
ENDDO
!$acc wait
!$acc end data
end subroutine
Program main
INTEGER, PARAMETER :: Nvec = 10000, Nchunks = 10000
REAL*8 :: a(1:Nvec*Nchunks), b(1:Nvec*Nchunks)
DO j = 0,Nchunks*Nvec-1
a(j+1)=j
enddo
call zz(a,b)
write(*,*) "sum = ",SUM(b)
end |
profiler shows that "update host async" directive produce synchronous call
| Code: | method,gputime,cputime,occupancy
method=[ memcpyHtoDasync ] gputime=[ 52.672 ] cputime=[ 7.000 ]
method=[ zz_10_gpu ] gputime=[ 5.856 ] cputime=[ 7.000 ] occupancy=[ 0.667 ]
method=[ memcpyDtoH ] gputime=[ 49.184 ] cputime=[ 120.000 ]
method=[ memcpyHtoDasync ] gputime=[ 53.856 ] cputime=[ 6.000 ]
method=[ zz_10_gpu ] gputime=[ 5.248 ] cputime=[ 7.000 ] occupancy=[ 0.667 ]
method=[ memcpyDtoH ] gputime=[ 49.143 ] cputime=[ 121.000 ] |
Is it my error or "async" was not implemented yet |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Aug 02, 2012 10:03 am Post subject: |
|
|
Hi Alexey,
What's happening is that a helper thread is being spawned to handle the device to host transfer. So while we don't call CUDA's async memory routine, the thread, and thus the copy, is being run asynchronously to the main host thread.
Hope this helps,
Mat |
|
| Back to top |
|
 |
Alexey A. Romanenko
Joined: 17 Feb 2012 Posts: 31
|
Posted: Sun Aug 05, 2012 10:24 pm Post subject: |
|
|
Thanks Mat!
Are you going to implement it in the next release?
Alexey |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Aug 06, 2012 10:54 am Post subject: |
|
|
| Quote: | | Are you going to implement it in the next release? | No. The problem is that there is no call backs from the device so when doing a device to host async transfer, there isn't a way to know that the data transfer is complete. Hence, the use of the helper thread.
- Mat |
|
| Back to top |
|
 |
Alexey A. Romanenko
Joined: 17 Feb 2012 Posts: 31
|
Posted: Tue Aug 07, 2012 10:27 pm Post subject: |
|
|
why!
You have cuda on lower layer. So use
| Code: |
cudaError_t cudaStreamCreate (cudaStream_t *pStream)
cudaError_t cudaStreamDestroy (cudaStream_t stream)
cudaError_t cudaStreamQuery (cudaStream_t stream)
cudaError_t cudaStreamSynchronize (cudaStream_t stream)
|
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|