PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

cudaMemcpy fails copying ACC variable to CUF variable

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
rikisyo



Joined: 20 Jun 2013
Posts: 15

PostPosted: Wed Aug 07, 2013 8:17 am    Post subject: cudaMemcpy fails copying ACC variable to CUF variable Reply with quote

This sample code is to expose a problem when I am trying to make cudamemcpy between the device copy of an ACC variable and a CUF device variable:

Code:

program cuf2acc

use cudafor
use openacc

implicit none

integer, parameter :: DA=8

real, allocatable, dimension(:) :: sendbuf, recvbuf
real, device, allocatable, dimension(:) :: sendbuf_d, recvbuf_d

integer :: i

call acc_init( acc_device_default )

allocate(sendbuf(DA),recvbuf(DA))
allocate(sendbuf_d(DA),recvbuf_d(DA))

sendbuf=-2. ; recvbuf=-2.
sendbuf_d=1 ; recvbuf_d=-1.

!$acc data copy( sendbuf, recvbuf )
i = cudaDeviceSynchronize()
!$acc host_data use_device( sendbuf, recvbuf )
i = cudaMemcpy(sendbuf,sendbuf_d,DA,cudaMemcpyDeviceToDevice)
i = cudaMemcpy(recvbuf,recvbuf_d,DA,cudaMemcpyDeviceToDevice)
!$acc end host_data
!$acc end data

write(*,'("S: "8F8.2)') (sendbuf(i),i=1,DA)
write(*,'("R: "8F8.2)') (recvbuf(i),i=1,DA)

end program cuf2acc


It's compiled by
Code:

$ pgf90 -o cuf2acc -Mcuda -acc -ta=nvidia,cc20 cuf2acc.f90


When it runs it always gives an error:
Code:

0: copyover Memcpy (dst=0xa15d80, src=0xb00200000, size=32) FAILED: 11(invalid argument)


It seems that the dst is a host address. So I try removing the "host_data" construct around the memcpy, and the error is the same. It seems to me that the host_data construct never exposes the device address of the ACC variables. Why is it so and how can I get around this?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Wed Aug 07, 2013 10:36 am    Post subject: Reply with quote

Hi rikisyo,

Are you trying to test the "host_data" directive or just need to copy the data between the device and host? The "host_data" directive isn't supported yet in Fortran and is why this example is failing.

If you just need to copy the data, then the simple method is to just assign the host to device array. No need for OpenACC at all.

Code:
program cuf2acc

use cudafor
use openacc

implicit none

integer, parameter :: DA=8

real, allocatable, dimension(:) :: sendbuf, recvbuf
real, device, allocatable, dimension(:) :: sendbuf_d, recvbuf_d

integer :: i

call acc_init( acc_device_default )

allocate(sendbuf(DA),recvbuf(DA))
allocate(sendbuf_d(DA),recvbuf_d(DA))

sendbuf=-2. ; recvbuf=-2.
sendbuf_d=1 ; recvbuf_d=-1.

sendbuf=sendbuf_d
recvbuf=recvbuf_d

write(*,'("S: "8F8.2)') (sendbuf(i),i=1,DA)
write(*,'("R: "8F8.2)') (recvbuf(i),i=1,DA)

end program cuf2acc


- Mat
Back to top
View user's profile
rikisyo



Joined: 20 Jun 2013
Posts: 15

PostPosted: Thu Aug 08, 2013 7:16 am    Post subject: Reply with quote

Thanks for the reply!

My original purpose is not just copying data back-and-forth between the host and the device, but to use CUDA-aware MPI calls to do face data exchange of my multiblock CFD code. Particularly, I would be extremely happy if there is a supported way to do "CUDA-AWARE MPI AND OPENACC" as claimed in the following webpage:

https://developer.nvidia.com/content/benchmarking-cuda-aware-mpi

If host_data is not supported then there is no way these MPI API can work with ACC variables, much for the same reason as the cudaMemcpy failure. Is there a way to safely expose the device address so that cudaMemcpy can work?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Thu Aug 08, 2013 7:34 am    Post subject: Reply with quote

Hi rikisyo,

First let me clarify that the host_data directive is supported in C and under development in Fortran.

In reading the article, it seems to me that you would simply pass in the CUDA Fortran device array. Depending on the MPI implementation, you may need to write an interface to the MPI_SENDRECV buffer.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group