PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

cudaMemsetAsync and easier syntax for async copying

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
TroelsH



Joined: 24 Mar 2010
Posts: 9

PostPosted: Thu Dec 20, 2012 9:17 am    Post subject: cudaMemsetAsync and easier syntax for async copying Reply with quote

Is cudaMemsetAsync going to be supported in a future release ?

It would be very helpful for setting up things before a kernel call, because for example zeroing a device array can often be done before some other (cpu-related) init stuff. Likewise it would be very nice with a simple method to do async copy of constants.

Related to this issue: Have you considered extending the simple syntax
Code:
a_dev = a_host

for copying host variable a_host to device variable a_dev to allow for async copying ? Maybe one could use a directive like
Code:
!$cuf async(stream)
a_dev = a_host

Just a thought. It would clean up a lot of code, make it more readable, and make it much easier to extend a program to support async memory transfers.

For the time being, I have found a way to directly call the cuda API by way of interfacing to the C-routine. This seems to work :
Code:
interface
  function cudaMemsetAsync(arr, value, bytes, stream) bind(c,name='cudaMemsetAsync')
    use iso_c_binding
    use cudafor
    integer(c_int),    value :: value, stream, cudaMemsetAsync
    integer(c_size_t), value :: bytes
    type(C_devptr), value :: arr
  end function cudaMemsetAsync
end interface

In my code I use it this way:
Code:
  use iso_c_binding
  integer :: n, ierr, stream
  integer, allocatable,device :: i(:)
  type(c_devptr) :: i_ptr
  integer(c_size_t) :: nbytes
  ...
  allocate(i(n))
  nbytes = n*4
  i_ptr = c_devloc(i)
  ierr = cudaMemsetAsync(i, 0, nbytes,stream)

but it would be easier if it was included in cudafor.

best,

Troels
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Thu Dec 20, 2012 3:18 pm    Post subject: Reply with quote

Hi Troels,

Because cudaMemset only takes 32-bit values, we decided to write our own implementation. However, we didn't add cudaMemsetAsync. I asked our engineering manager who said that we probably couldn't do anything in the short term but will see what we can do.

- Mat
Back to top
View user's profile
TroelsH



Joined: 24 Mar 2010
Posts: 9

PostPosted: Thu Dec 20, 2012 6:46 pm    Post subject: Reply with quote

Hi Mat,

Thanks for the fast reply. I will keep my CUDA-C interfacing for the time being then, and dream about !$cuf async's in a distant future :-)

Troels
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group