PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

how to use cudamemcpy3dasync?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
steve.xu



Joined: 20 Feb 2012
Posts: 25

PostPosted: Wed Apr 04, 2012 2:08 am    Post subject: how to use cudamemcpy3dasync? Reply with quote

hi,everyone
I am trying to implement asynchronous data transfer between gpu and cpu in cuda fortran. My data is a 3D array, which means i should use cudamemcpy3dasync. But the cuda fortran reference is too simple and i donot know how to fill in the "cudaMemcpy3DParms" structure. Anybody has any experience about how to perform asynchronous data transfer of 3D array??
By the way, if i want to copy a 4D array to gpu asynchronously, must i split it into many 3D arrays? or are there other alternative methods?

Thanks!
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Wed Apr 04, 2012 10:47 am    Post subject: Reply with quote

Hi Steve,

I don't have an example off hand but could pull one together. Though, it's probably not necessary to use the 3D functions. If you are copying the entire array, you can simply use cudaMemCpyAsync. Fotran arrays are contiguous so just copy it as a 1-D array with a size of N*M*L. Same could be done with a 4-D array.

- Mat
Back to top
View user's profile
steve.xu



Joined: 20 Feb 2012
Posts: 25

PostPosted: Tue Apr 10, 2012 4:21 am    Post subject: Reply with quote

mkcolg wrote:
Hi Steve,

I don't have an example off hand but could pull one together. Though, it's probably not necessary to use the 3D functions. If you are copying the entire array, you can simply use cudaMemCpyAsync. Fotran arrays are contiguous so just copy it as a 1-D array with a size of N*M*L. Same could be done with a 4-D array.

- Mat

Thanks Mat.
Actually i am trying to implement a program that can perform asynchronous data transfering and kernel execution. I think i have to divide my data (which is a 4D array) into severel parts, and each part of the data can be transfered in different stream and the kernel in the same stream can then be executed. Can i also use cudaMemCpyAsync to do this????
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Wed Apr 11, 2012 9:03 am    Post subject: Reply with quote

Quote:
Can i also use cudaMemCpyAsync to do this?
Sure, provided that you block the data so it's in contiguous sections. Otherwise, you will need to use the 3D routine.

- Mat
Back to top
View user's profile
steve.xu



Joined: 20 Feb 2012
Posts: 25

PostPosted: Mon Apr 16, 2012 12:37 am    Post subject: Reply with quote

Thanks Mat!
I just cannot find some examples about how to use cudaMemCpyAsync3D. Surely i need to copy parts of a 3D array (say A(N1,N2,N3) )each time to overlap communication and computation. For example, i need to copy A(N1/2,N2/2,N3/2) ,and then launch a kernel, and then copy another part of A and then execute the kernel.
Can i use cudaMemCpyAsync to do this or How to do this by using cudaMemCpyAsync3D??



mkcolg wrote:
Quote:
Can i also use cudaMemCpyAsync to do this?
Sure, provided that you block the data so it's in contiguous sections. Otherwise, you will need to use the 3D routine.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group