CUDA Fortran and CUDA API: Constant Memory and Symbol

PostPosted: Thu Jun 10, 2010 8:25 am    Post subject: CUDA Fortran and CUDA API: Constant Memory and Symbol

This thread cannot be much of a surprise as I stumble my way through using the CUDA API with CUDA Fortran. After figuring out--thanks to Brent--how to do asynchronous copies to global memory of 1D and 2D data (and 3D as 1D), I've now moved on to the next set of CPU-to-GPU copies I make: asynchronous constant memory copies.

So, being a naf, I decided to do the usual and experiment with cudaMemcpyToSymbol before adding the Async. Unfortunately, I'm can't seem to get it just right. To wit, I have, modifying the code Brent provided:
module test3d
   integer*4, device, allocatable :: x(:,:,:)
   integer, constant :: d_constant
   attributes(global) subroutine s1()
      i = threadidx%x
      j = threadidx%y
      k = blockidx%x
      x(i,j,k) = x(i,j,k) + i + j + k + d_constant
   end subroutine s1
end module

program t

use test3d
use cudafor

integer, parameter :: N = 20
integer*4, allocatable, pinned :: h(:,:,:)
type(dim3) :: ngrid, nblock
integer :: h_constant
h = 1
h_constant = 5
ngrid = dim3(N,1,1)
nblock= dim3(N,N,1)

istat = cudaMemcpyAsync(x,h,N*N*N)
if (istat .ne. 0) print *,"cudaMemcpyAsync 1 ",istat
istat = cudaThreadSynchronize()
istat = cudaMemcpyToSymbol(d_constant,h_constant,1)
if (istat .ne. 0) print *,"cudaMemcpyToSymbol ",istat
call s1 <<<ngrid, nblock>>> ()

istat = cudaMemcpyAsync(h,x,N*N*N)
if (istat .ne. 0) print *,"cudaMemcpyAsync 2 ",istat
istat = cudaThreadSynchronize()

do k = 1, N
   do j = 1, N
      do i = 1, N
         if (h(i,j,k) .ne. 1+i+j+k+h_constant) then
            print *,"Error at ",i,j,k
         end if
      end do
   end do
end do

end program t

where I've added h_constant, d_constant, the Symbol copy, and the extra code adding the constant in appropriate places.
But, when I try to compile:
> pgfortran test3d.cuf drivert.cuf
PGF90-S-0155-Could not resolve generic procedure cudamemcpytosymbol (drivert.cuf: 23)
  0 inform,   0 warnings,   1 severes, 0 fatal for t

Hmm, okay. So I tried various cudaMemcpyToSymbol calls thinking I need to pass a string (a la C) or the size*kind rather than size or even the offset (which I'm a bit unclear on its purpose...padding?):
istat = cudaMemcpyToSymbol(d_constant,h_constant,1)
istat = cudaMemcpyToSymbol(d_constant,h_constant,4)
istat = cudaMemcpyToSymbol("d_constant",h_constant,1)
istat = cudaMemcpyToSymbol("d_constant",h_constant,4)
istat = cudaMemcpyToSymbol(d_constant,h_constant,1,0)
istat = cudaMemcpyToSymbol("d_constant",h_constant,1,0)

All gave the same error as above. Does using "integer, constant" declare d_constant as an integer and not as type(cudaSymbol)? If so, I guess I could see that error (which I usually associate with typing issues).

(Note: This is to say nothing of the inevitable next question. If I have two (or more) constants, do I need to explicitly manage the "offset" part of cudaMemcpyToSymbolAsync?)

