|
| View previous topic :: View next topic |
| Author |
Message |
alfvenwave
Joined: 08 Apr 2010 Posts: 77
|
Posted: Thu Apr 08, 2010 8:55 am Post subject: Problem declaring device types inside OpenMP parallel loop. |
|
|
Hi. I'm having some trouble working out how to access multiple GPUs using openMP with the PGI fortran compiler "pgfortran".
Here is my problem. There is a very simple test program "openmpcudatest.cuf" below.
If I compile with "pgfortran openmpcudatest.cuf", I get the right answer (I have set OMP_NUM_THREADS=2):
ithread : 0 idev : 0
Try to copy HOST data to device...
Copy for loop cycle 1 OK
Set Adev to 1 using the assigned GPU....
ithread : 0 idev : 0
Try to copy HOST data to device...
Copy for loop cycle 2 OK
Set Adev to 1 using the assigned GPU....
2.000000 2.000000 2.000000 2.000000
2.000000 2.000000 2.000000 2.000000
2.000000 2.000000
Now, if I compile with "pgfortran openmptest.cuf -mp", the code won't compile:
PGF90-S-0188-Argument number 1 to cudakerneltest: type mismatch (ompcudatest.cuf: 36)
0 inform, 0 warnings, 1 severes, 0 fatal for ompcudatest
i.e. it looks like the PGI compiler doesn't like the device variable Adev being decalred in the OMP pragma PRIVATE declaration. If I remove it, the code does indeed compile. However, when I then run the code, I get this:
ithread : 1 idev : 1
Try to copy HOST data to device...
ithread : 0 idev : 0
Try to copy HOST data to device...
Copy for loop cycle 1 OK
Set Adev to 1 using the assigned GPU....
copyin Memcpy FAILED:17
i.e. one of the threads (the second one) has failed to copy private host array A to its designated device.
All I want to be able to do is to treat each openMP thread as separate, with its own attached GPU device, then add up all the data at the end.
Please can anyone tell me how to declare device variables inside openMP parallelised loops, or otherwise how to get openMP and CUDA to play nicely with eachother using pgfortran?
Thanks,
Rob.
program ompcudatest
use cudafor
use cudaKernelTest
implicit none
integer :: i,ithread,idev,iflag
real :: A (10),B(10)
real, device :: Adev(10)
integer :: omp_get_thread_num
!$OMP PARALLEL PRIVATE(i,ithread,idev,Adev,A) SHARED(B)
!$OMP DO
do i=1,2
ithread = omp_get_thread_num()
iflag = cudaSetDevice(ithread)
iflag = cudaGetDevice(idev)
print*, 'ithread : ',ithread, 'idev : ',idev
print*, 'Try to copy HOST data to device...'
Adev = A
print*, 'Copy for loop cycle ',i,' OK'
call cudaKernelTest( Adev )
! Retrieve device array back to host for thread ithread
A = Adev
! Sum up arrays to shared host array B:
B = B + A
enddo
!$OMP END DO
!$OMP END PARALLEL
! 2 threads initiated, B should be 10 values of 2.0:
print*,B
end
! A very simple CUDA kernel + wrapper:
module cudaKernelTest
use cudafor
implicit none
contains
attributes(global) subroutine cudaKernelTest_kernel( n, Adev )
implicit none
integer, value :: n
real :: Adev(n)
Adev = 1.0
end subroutine cudaKernelTest_kernel
! Kernel wrapper:
subroutine cudaKernelTest( Adev )
implicit none
real, device :: Adev(:)
integer, value :: n
n = size(Adev,1)
print*, 'Set Adev to 1 using the assigned GPU....'
call cudaKernelTest_kernel<<<64,64>>>(n, Adev)
end subroutine cudaKernelTest
end module cudaKernelTest |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Apr 08, 2010 12:02 pm Post subject: |
|
|
Hi alfvenwave,
Currently OpenMP doesn't support "device" variables in the private clause. To work around this, pass "A" to "cudaKernelTest" and then declare "Adev" in "cudaKernelTest".
For example:
| Code: | % cat openmp.cuf
! A very simple CUDA kernel + wrapper:
module cudaKernelTest
use cudafor
implicit none
contains
attributes(global) subroutine cudaKernelTest_kernel( n, Adev )
implicit none
integer, value :: n
real :: Adev(n)
Adev = 1.0
end subroutine cudaKernelTest_kernel
! Kernel wrapper:
subroutine cudaKernelTest( A )
implicit none
real :: A(10)
real, device :: Adev(10)
integer, value :: n
Adev = 1.0
n = size(Adev,1)
!print*, 'Set Adev to 1 using the assigned GPU....'
call cudaKernelTest_kernel<<<64,64>>>(n, Adev)
A=Adev
end subroutine cudaKernelTest
end module cudaKernelTest
program ompcudatest
use cudafor
use cudaKernelTest
implicit none
integer :: i,ithread,idev,iflag
real :: A (10),B(10)
integer :: omp_get_thread_num
!$OMP PARALLEL PRIVATE(i,ithread,idev,A) SHARED(B)
ithread = omp_get_thread_num()
iflag = cudaSetDevice(ithread)
iflag = cudaGetDevice(idev)
print*, 'ithread : ',ithread, 'idev : ',idev
!$OMP DO
do i=1,2
!print*, ithread, ' Copy for loop cycle ',i,' OK'
call cudaKernelTest( A )
! Sum up arrays to shared host array B:
B = B + A
enddo
!$OMP END DO
!$OMP END PARALLEL
! 2 threads initiated, B should be 10 values of 2.0:
print*,B
end
% setenv OMP_NUM_THREADS 2
% pgf90 openmp.cuf -mp
% a.out
ithread : 1 idev : 1
ithread : 0 idev : 0
2.000000 2.000000 2.000000 2.000000
2.000000 2.000000 2.000000 2.000000
2.000000 2.000000
xps730:/tmp/qa% |
Hope this helps,
Mat |
|
| Back to top |
|
 |
alfvenwave
Joined: 08 Apr 2010 Posts: 77
|
Posted: Fri Apr 09, 2010 6:17 am Post subject: |
|
|
| Thanks mkcolg - that works..... |
|
| Back to top |
|
 |
Malcolm Bibby
Joined: 16 Nov 2009 Posts: 28
|
Posted: Mon Apr 12, 2010 8:35 am Post subject: |
|
|
What am I missing in the above code. idev never appears on the right hand side and yet its value appears to be set! Would someone please explain!
Malcolm |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Apr 12, 2010 10:45 am Post subject: |
|
|
Hi Malcom,
"idev" gets set in the call to cudaGetDevice.
| Code: | | iflag = cudaGetDevice(idev) |
- Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|