PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Problem declaring device types inside OpenMP parallel loop.

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
alfvenwave



Joined: 08 Apr 2010
Posts: 79

PostPosted: Thu Apr 08, 2010 8:55 am    Post subject: Problem declaring device types inside OpenMP parallel loop. Reply with quote

Hi. I'm having some trouble working out how to access multiple GPUs using openMP with the PGI fortran compiler "pgfortran".

Here is my problem. There is a very simple test program "openmpcudatest.cuf" below.

If I compile with "pgfortran openmpcudatest.cuf", I get the right answer (I have set OMP_NUM_THREADS=2):

ithread : 0 idev : 0
Try to copy HOST data to device...
Copy for loop cycle 1 OK
Set Adev to 1 using the assigned GPU....
ithread : 0 idev : 0
Try to copy HOST data to device...
Copy for loop cycle 2 OK
Set Adev to 1 using the assigned GPU....
2.000000 2.000000 2.000000 2.000000
2.000000 2.000000 2.000000 2.000000
2.000000 2.000000

Now, if I compile with "pgfortran openmptest.cuf -mp", the code won't compile:

PGF90-S-0188-Argument number 1 to cudakerneltest: type mismatch (ompcudatest.cuf: 36)
0 inform, 0 warnings, 1 severes, 0 fatal for ompcudatest

i.e. it looks like the PGI compiler doesn't like the device variable Adev being decalred in the OMP pragma PRIVATE declaration. If I remove it, the code does indeed compile. However, when I then run the code, I get this:

ithread : 1 idev : 1
Try to copy HOST data to device...
ithread : 0 idev : 0
Try to copy HOST data to device...
Copy for loop cycle 1 OK
Set Adev to 1 using the assigned GPU....
copyin Memcpy FAILED:17

i.e. one of the threads (the second one) has failed to copy private host array A to its designated device.

All I want to be able to do is to treat each openMP thread as separate, with its own attached GPU device, then add up all the data at the end.

Please can anyone tell me how to declare device variables inside openMP parallelised loops, or otherwise how to get openMP and CUDA to play nicely with eachother using pgfortran?

Thanks,

Rob.




program ompcudatest

use cudafor
use cudaKernelTest

implicit none

integer :: i,ithread,idev,iflag
real :: A (10),B(10)
real, device :: Adev(10)
integer :: omp_get_thread_num

!$OMP PARALLEL PRIVATE(i,ithread,idev,Adev,A) SHARED(B)
!$OMP DO

do i=1,2
ithread = omp_get_thread_num()
iflag = cudaSetDevice(ithread)
iflag = cudaGetDevice(idev)

print*, 'ithread : ',ithread, 'idev : ',idev

print*, 'Try to copy HOST data to device...'

Adev = A

print*, 'Copy for loop cycle ',i,' OK'

call cudaKernelTest( Adev )

! Retrieve device array back to host for thread ithread

A = Adev

! Sum up arrays to shared host array B:

B = B + A

enddo

!$OMP END DO
!$OMP END PARALLEL

! 2 threads initiated, B should be 10 values of 2.0:

print*,B

end



! A very simple CUDA kernel + wrapper:

module cudaKernelTest

use cudafor

implicit none

contains

attributes(global) subroutine cudaKernelTest_kernel( n, Adev )

implicit none

integer, value :: n
real :: Adev(n)

Adev = 1.0

end subroutine cudaKernelTest_kernel

! Kernel wrapper:

subroutine cudaKernelTest( Adev )

implicit none

real, device :: Adev(:)
integer, value :: n

n = size(Adev,1)

print*, 'Set Adev to 1 using the assigned GPU....'

call cudaKernelTest_kernel<<<64,64>>>(n, Adev)

end subroutine cudaKernelTest

end module cudaKernelTest
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Thu Apr 08, 2010 12:02 pm    Post subject: Reply with quote

Hi alfvenwave,

Currently OpenMP doesn't support "device" variables in the private clause. To work around this, pass "A" to "cudaKernelTest" and then declare "Adev" in "cudaKernelTest".

For example:
Code:
% cat openmp.cuf
! A very simple CUDA kernel + wrapper:

module cudaKernelTest
use cudafor

implicit none

contains

attributes(global) subroutine cudaKernelTest_kernel( n, Adev )

implicit none
integer, value :: n
real :: Adev(n)

Adev = 1.0

end subroutine cudaKernelTest_kernel

! Kernel wrapper:
subroutine cudaKernelTest( A )

implicit none

real :: A(10)
real, device :: Adev(10)
integer, value :: n

Adev = 1.0
n = size(Adev,1)
!print*, 'Set Adev to 1 using the assigned GPU....'
call cudaKernelTest_kernel<<<64,64>>>(n, Adev)
A=Adev

end subroutine cudaKernelTest
end module cudaKernelTest

program ompcudatest

use cudafor
use cudaKernelTest

implicit none

integer :: i,ithread,idev,iflag
real :: A (10),B(10)
integer :: omp_get_thread_num

!$OMP PARALLEL PRIVATE(i,ithread,idev,A) SHARED(B)

ithread = omp_get_thread_num()
iflag = cudaSetDevice(ithread)
iflag = cudaGetDevice(idev)
print*, 'ithread : ',ithread, 'idev : ',idev

!$OMP DO
do i=1,2

!print*, ithread, ' Copy for loop cycle ',i,' OK'

call cudaKernelTest( A )

! Sum up arrays to shared host array B:

B = B + A
enddo

!$OMP END DO
!$OMP END PARALLEL

! 2 threads initiated, B should be 10 values of 2.0:
print*,B
end
% setenv OMP_NUM_THREADS 2
% pgf90 openmp.cuf -mp
% a.out
 ithread :             1 idev :             1
 ithread :             0 idev :             0
    2.000000        2.000000        2.000000        2.000000
    2.000000        2.000000        2.000000        2.000000
    2.000000        2.000000
xps730:/tmp/qa%   


Hope this helps,
Mat
Back to top
View user's profile
alfvenwave



Joined: 08 Apr 2010
Posts: 79

PostPosted: Fri Apr 09, 2010 6:17 am    Post subject: Reply with quote

Thanks mkcolg - that works.....
Back to top
View user's profile
Malcolm Bibby



Joined: 16 Nov 2009
Posts: 33

PostPosted: Mon Apr 12, 2010 8:35 am    Post subject: Reply with quote

What am I missing in the above code. idev never appears on the right hand side and yet its value appears to be set! Would someone please explain!

Malcolm
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Mon Apr 12, 2010 10:45 am    Post subject: Reply with quote

Hi Malcom,

"idev" gets set in the call to cudaGetDevice.
Code:
iflag = cudaGetDevice(idev)


- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group