PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

64-bit integers in CUDA Fortran atomics

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
dcwarren



Joined: 18 Jun 2012
Posts: 29

PostPosted: Tue Jan 01, 2013 4:59 pm    Post subject: 64-bit integers in CUDA Fortran atomics Reply with quote

Happy New Year, everyone!

According to the CUDA specs (http://www.scribd.com/doc/84859529/136/B-11-1-1-atomicAdd), atomic operations can be performed on 64-bit integers since CC 1.2. However, when I try to compile the following code with select device integers declared as "integer(kind=8)" I get the error shown in triplicate below:
Quote:
PGF90-S-0155-Could not resolve generic procedure atomicadd (watchdog_test.f90:23)
PGF90-S-0155-Could not resolve generic procedure atomicadd (watchdog_test.f90:24)
PGF90-S-0155-Could not resolve generic procedure atomicadd (watchdog_test.f90:25)


The code follows. Switching the "kind=4" variables to "kind=8" is sufficient to trigger the error on my machine.
Code:
! ***************************************************************************
! ***************************************************************************
module gpu_kernel

integer, device :: num_loops
integer(kind=4), device :: i, j, k

contains

! ***************************************************************************
attributes(global) subroutine cuda_kernel()

use cudafor

implicit none

! Local variables
integer :: n
integer(kind=4) :: o
real :: temp

do n = 1, num_loops
  o = atomicadd(i, 1)
  o = atomicadd(j, 1)
  o = atomicadd(k, 1)
enddo

return
end subroutine

! ***************************************************************************
subroutine cuda_device_select()

use cudafor

implicit none

! Local variables
integer :: cudastat, nDevices, i, desired_device
real :: max_version, version
type(cudadeviceprop) :: prop

cudastat = cudaGetDeviceCount(nDevices)

max_version = 0.0
do i = 0, nDevices-1
  cudastat = cudaGetDeviceProperties(prop, i)
  version = float(prop%major) + 0.1*float(prop%minor)
  if(version .gt. max_version) then
    max_version = version
    desired_device = i
  endif
enddo

cudastat = cudaSetDevice(desired_device)
cudastat = cudaGetDeviceProperties(prop, desired_device)
write(*,"('   Device ',a,' selected.')") trim(prop%name)
write(*,"('   CUDA compute capability is ',i0,'.',i0)") prop%major,       &!&
     prop%minor

return
end subroutine

end module

! ***************************************************************************
! ***************************************************************************
program watchdog_test

use cudafor
use gpu_kernel

implicit none

! Local variables
integer(kind=4) :: m
real :: temp

i = 0;   j = 0;   k = 0

call cuda_device_select()

num_loops = 20000

call cuda_kernel<<<16,256>>>()

m = i; write(*,"(i0)") m
m = j; write(*,"(i0)") m
m = k; write(*,"(i0)") m

stop
end program


Am I doing something wrong, or is this a difference between CUDA Fortran and the official specs for CUDA?

Edit to add: If anyone reading this knows a slicker way to select a GPU, I'm all ears. The documentation for cudaChooseDevice didn't make a lot of sense to me.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6120
Location: The Portland Group Inc.

PostPosted: Mon Jan 07, 2013 11:00 am    Post subject: Reply with quote

Hi dcwarren,

Happy New Year to you as well!

Quote:
Am I doing something wrong, or is this a difference between CUDA Fortran and the official specs for CUDA?
The generic procedure for atomic add is expecting that both arguments are either kind 4 or 8. Hence, you need to set the constant "1" to kind 8 as well.

Code:
do n = 1, num_loops
  o = atomicadd(i, 1_8)
  o = atomicadd(j, 1_8)
  o = atomicadd(k, 1_8)
enddo


Quote:
If anyone reading this knows a slicker way to select a GPU, I'm all ears. The documentation for cudaChooseDevice didn't make a lot of sense to me.
I don't see anything wrong with your method. I wrote one for this PGInsdier article for use when selecting which MPI process gets which device. No better than what you have, it just works under a multiple process/multiple device context while yours is for a single process/multi-device context.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group