PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Asynchronous Memory Copy in CUDA Fortran

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Thu Jun 03, 2010 5:58 am    Post subject: Asynchronous Memory Copy in CUDA Fortran Reply with quote

Folks,

I was wondering if anyone has some experience/examples of using asynchronous memcpy with CUDA Fortran? At the moment, a program I have has a structure like this:

Code:
compute Aerosol Arrays
copy All Device Arrays included Aerosol Arrays to device
copy Constant Data to device
execute Kernel


The issue is that the compute Aerosol Arrays step can be quite long and I figure why not try and overlap as much of the memory copy with that step that I can. In truth, a good chunk of the data copied to the device are those Aerosol Arrays, but, well, every little bit is nice (plus I can learn for the future).

From what I can glean from the CUDA Fortran guides, I assume I'll have to use the API calls since I don't think the implicit memory copies are asynchronous. Is this correct?

If so, that's why I thought I'd ask for examples while I stumble through the cudaStreamCreate, cudaMemcpyToSymbolAsync, etc.

Matt
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Thu Jun 03, 2010 1:07 pm    Post subject: Reply with quote

Hi Matt,

Although I haven't done it myself, you should be able to use the CUDA API calls to accomplish this. Though, I don't have an example (Sorry).

We're currently working on expanding the CUDA Fortran language to define this asynchronous behavior. Unfortunately, it doesn't fit well into the current Fortran syntax so well most likely need to add an extension.

- Mat
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Fri Jun 04, 2010 11:11 am    Post subject: Reply with quote

Hmm. Okay. Do you have any examples showing the allocation/copy process using the API calls?

I ask mainly for the 2D and larger arrays. I figure cudaMalloc and cudaMemcpy are pretty simple since 1D is 1D Fortran or C. But when one starts getting into the 2D realm, I'm wondering do you have to use cudaMallocPitch/cudaMemcpy2D (since Fortran arrays usually don't act like C "arrays")?

ETA: Never mind, I figured this out (essentially it does what a padded array version of a program I wrote does). I'm next going to start new topic on 3D arrays since that's all new to me.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group