PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Using C-pointer for 3D Fortran array

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Tue Feb 11, 2014 12:16 pm    Post subject: Using C-pointer for 3D Fortran array Reply with quote

I am using CUDA Fortran. In my code, I am using linear array in GPU, and then copy it back to CPU using a linear array. This array is then RESHAPE() into a 3D array.

This is slow as I need 2 copies: one from GPU and one from CPU linear to CPU 3D array.

My question is if I can just declare a C-pointer to the 3D array, and then using cuda copy to copy linear array to this C-pointer which is pointing to the first element in the 3D array?

Thanks,
Tuan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Tue Feb 11, 2014 4:17 pm    Post subject: Reply with quote

Hi Tuan,

Quote:
My question is if I can just declare a C-pointer to the 3D array, and then using cuda copy to copy linear array to this C-pointer which is pointing to the first element in the 3D array?
Sure. You can use "integer(c_devptr)" for CUDA C device pointers and "integer(c_ptr)" for host C pointers. Then use "c_f_pointer" to create the association between you 3D arrays and the C pointer. You can grab the C pointer from a Fortran host array using "c_loc". I wrote a basic example below.

Though, I'm wondering if you really need to do all this. If you're just using CUDA Fortran and your arrays are contiguous, a simple "arr1 = arr2" statement will perform the same thing.

Hope this helps,
Mat

Code:
% cat test.cuf

module test_mod
  integer, device, allocatable :: x(:,:,:)
  contains
    attributes(global) subroutine kernel(l,m,n)
    integer, value :: l, m, n
    i = (blockidx%x-1)*blockdim%x + threadidx%x
    j = (blockidx%y-1)*blockdim%y + threadidx%y
    if (i <= l .and. j <= m ) then
       do k=1,N
         x(i,j,k) = ((i-1)*L*M)+((j-1)*M)+k
       enddo
    end if
    return
    end subroutine
end module test_mod

program test
use cudafor
use test_mod
implicit none
integer, parameter :: L = 64
integer, parameter :: M = 64
integer, parameter :: N = 64
integer size
integer istat
integer, target, allocatable, dimension(:,:,:) :: y
type(dim3) :: blocks
type(dim3) :: threads
type(c_ptr) :: h1
type(c_devptr) :: cd1

allocate(y(L,M,N))
size = L*M*N
istat = cudaMalloc(cd1, size*4)
call c_f_pointer(cd1, x, (/ L, M, N /))

blocks = dim3((L+31)/32, (M+31)/32, 1)
threads = dim3(32, 32, 1)

call kernel<<<blocks,threads>>>(L,M,N)
istat = cudaThreadSynchronize()
print *, cudaGetErrorString(cudaGetLastError())

h1 = c_loc(y)
istat = cudaMemcpy(h1, cd1, size*4, cudaMemcpyDeviceToHost)

print *, y(1,1,:)
print *, y(64,64,:)
deallocate(y)
istat = cudaFree(cd1)
end
% pgf90 test.cuf ; a.out
 no error
            1            2            3            4            5            6
            7            8            9           10           11           12
           13           14           15           16           17           18
           19           20           21           22           23           24
           25           26           27           28           29           30
           31           32           33           34           35           36
           37           38           39           40           41           42
           43           44           45           46           47           48
           49           50           51           52           53           54
           55           56           57           58           59           60
           61           62           63           64
       262081       262082       262083       262084       262085       262086
       262087       262088       262089       262090       262091       262092
       262093       262094       262095       262096       262097       262098
       262099       262100       262101       262102       262103       262104
       262105       262106       262107       262108       262109       262110
       262111       262112       262113       262114       262115       262116
       262117       262118       262119       262120       262121       262122
       262123       262124       262125       262126       262127       262128
       262129       262130       262131       262132       262133       262134
       262135       262136       262137       262138       262139       262140
       262141       262142       262143       262144
Back to top
View user's profile
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Sun Feb 23, 2014 11:47 am    Post subject: Reply with quote

The reason I'm doing this is that I use a linear representation of a 3D array in kernel. However, when doing some calculation I only need to use a smaller part of the array (i.e. avoiding boundary or stencil gridpoints) so I need to reshape the array into 3D and extract the inner part of the array.

Currently, I need to copy the data back, then RESHAPE the array to use 3D-indexing to extract the data in the smaller subarray easier.

Do you have a better approach for this?

Thanks,
Tuan
Back to top
View user's profile
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Sun Feb 23, 2014 11:50 am    Post subject: Reply with quote

My another question. If I use C-pointer in Fortran code pointing to data allocated by Fortran which is column-based, so when I use the C-pointer should the compiler treats it as row-based like in C or column-based like in Fortran?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Mon Feb 24, 2014 1:51 pm    Post subject: Reply with quote

Hi Tuan,

You can't access the c_ptr directly in Fortran but instead it's associated Fortran variable. For example in the above program, you can say "y(1,1,1)" but not "h1(1)".

Quote:
Currently, I need to copy the data back, then RESHAPE the array to use 3D-indexing to extract the data in the smaller subarray easier.

Do you have a better approach for this?
Have you tired using F90 pointers?

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group