PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

no devices detected
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
brush



Joined: 26 Jun 2012
Posts: 44

PostPosted: Tue Jul 09, 2013 4:39 pm    Post subject: no devices detected Reply with quote

In my MPI code I assign MPI processes GPU's with:
Code:

        #ifdef _OPENACC
          call acc_init(acc_device_default)
          dtype=acc_get_device_type()
          numdevices = acc_get_num_devices(acc_device_nvidia)
          print *, "device type=", dtype
          print *, "mpi rank = ", MyId
          print *, "# devices on my node = ",numdevices
          mydevice = mod(MyId,numdevices)
          call acc_set_device_num(mydevice,acc_device_nvidia)
        #endif


At run time, my print messages show that I am not detecting any GPUs (I run 8 nodes, 1 MPI process per node):
Code:

 device type=            0
 mpi rank =             0
 # devices on my node =             0

 device type=            0
 mpi rank =             4
 # devices on my node =             0

 device type=            0
 mpi rank =             5
 # devices on my node =             0

etc.


However, pgaccel info has no trouble finding the gpu:

Code:
-bash-3.2$ pgaccelinfo
CUDA Driver Version:           5050
NVRM version: NVIDIA UNIX x86_64 Kernel Module  319.23  Thu May 16 19:36:02 PDT 2013

Device Number:                 0
Device Name:                   Tesla C1060
Device Revision Number:        1.3
Global Memory Size:            4294770688
Number of Multiprocessors:     30
Number of Cores:               240
Concurrent Copy and Execution: Yes
Total Constant Memory:         65536
Total Shared Memory per Block: 16384
Registers per Block:           16384
Warp Size:                     32
Maximum Threads per Block:     512
Maximum Block Dimensions:      512, 512, 64
Maximum Grid Dimensions:       65535 x 65535 x 1
Maximum Memory Pitch:          2147483647B
Texture Alignment:             256B
Clock Rate:                    1296 MHz
Execution Timeout:             No
Integrated Device:             No
Can Map Host Memory:           Yes
Compute Mode:                  default
Concurrent Kernels:            No
ECC Enabled:                   No
Memory Clock Rate:             800 MHz
Memory Bus Width:              512 bits
Max Threads Per SMP:           1024
Async Engines:                 1
Unified Addressing:            No
Initialization time:           657481 microseconds
Current free memory:           4237299456
Upload time (4MB):             1153 microseconds ( 726 ms pinned)
Download time:                 1053 microseconds ( 772 ms pinned)
Upload bandwidth:              3637 MB/sec (5777 MB/sec pinned)
Download bandwidth:            3983 MB/sec (5433 MB/sec pinned)


Removing the assignment code alltogether since there's only 1 GPU per node anyways still shows that I am having trouble detecting the GPU:
Code:

call to cuInit returned error 100: No device


Any common causes of this sort of behavior? I rememberd to have use openacc in my code this time.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6473
Location: The Portland Group Inc.

PostPosted: Tue Jul 09, 2013 5:06 pm    Post subject: Reply with quote

Hi Ben,

I have no idea. Other then when the system doesn't have a GPU or if the GPU isn't enabled would "acc_get_num_devices" return 0. Are you submitting your job via qsub? Could it be giving you some non-GPU enabled nodes? Maybe the environment isn't being setup (ie the CUDA driver isn't loaded)?

While I don't think will help, but here's the code I use to set devices in MPI.

Code:
#ifdef _OPENACC

function setDevice(nprocs,myrank)

  use iso_c_binding
  use openacc
  implicit none
  include ‘mpif.h’

  interface
    function gethostid() BIND(C)
      use iso_c_binding
      integer (C_INT) :: gethostid
    end function gethostid
  end interface

  integer :: nprocs, myrank
  integer, dimension(nprocs) :: hostids, localprocs
  integer :: hostid, ierr, numdev, mydev, i, numlocal
  integer :: setDevice

! get the hostids so we can determine what other processes are on this node
  hostid = gethostid()
  CALL mpi_allgather(hostid,1,MPI_INTEGER,hostids,1,MPI_INTEGER, &
                     MPI_COMM_WORLD,ierr)

! determine which processare are on this node
  numlocal=0
  localprocs=0
  do i=1,nprocs
    if (hostid .eq. hostids(i)) then
      localprocs(i)=numlocal
      numlocal = numlocal+1
    endif
  enddo

! get the number of devices on this node
  numdev = acc_get_num_devices(ACC_DEVICE_NVIDIA)

  if (numdev .lt. 1) then
    print *, 'ERROR: There are no devices available on this host.  &
              ABORTING.', myrank
    stop
  endif

! print a warning if the number of devices is less then the number
! of processes on this node.  Having multiple processes share devices is not   
! recommended.
  if (numdev .lt. numlocal) then
   if (localprocs(myrank+1).eq.1) then
     ! print the message only once per node
   print *, 'WARNING: The number of process is greater then the number  &
             of GPUs.', myrank
   endif
   mydev = mod(localprocs(myrank+1),numdev)
  else
   mydev = localprocs(myrank+1)
  endif

 call acc_set_device_num(mydev,ACC_DEVICE_NVIDIA)
 call acc_init(ACC_DEVICE_NVIDIA)
 setDevice = mydev

end function setDevice
#endif
Back to top
View user's profile
brush



Joined: 26 Jun 2012
Posts: 44

PostPosted: Thu Jul 11, 2013 10:50 am    Post subject: Reply with quote

Hi Mat, thanks for the feedback.

I made a small reproducing example. All it does it assign GPUs in an MPI environment with your setDevice routine. As expected, the code is aborted because it detects 0 devices. But interestingly, your CUDA version of setDevice works (found here: http://www.pgroup.com/lit/articles/insider/v3n3a2.htm). I am wondering if this problem is specific to dirac, as this is all I've tested on. The two codes are pasted below.

Compiled with: mpif90 -Mcuda -o testmpiv4 testmpiv4.F90
Job submitted with: qsub -I -V -q dirac_reg -l walltime=10:00 -l nodes=2:ppn=1
Run with: mpirun -np 2 ./testmpiv4
And modules: pgi/12.3, pgi-gpu/12.3
Code:
! GPU assignment done by setDevice function utilizing CUDA
! correctly assigns GPUs across multiple nodes
      program testmpiv4
      use cudafor
      include "mpif.h"

      integer ierr, myid,numprocs
      integer devnum

      call MPI_INIT(ierr)

      call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
      call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)

! assign gpu
      numdev=1
      ierr = cudaGetDeviceCount(numdev)
      if (ierr.ne.0) then
        print*,cudaGetErrorString(ierr)
        stop
      endif
      if(numdev.lt.1) then
        print *, 'ERROR:NO DEVICES FOUND.'
        stop
      endif
      devnum=setDevice(numprocs, myid)
      ierr = cudaSetDevice(devnum)

      call MPI_FINALIZE(ierr)
      end

!cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
! Mat's setDevice function
!cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
        function setDevice(nprocs,myrank)
          use iso_c_binding
          use cudafor
          implicit none
          include "mpif.h"

          interface
            function gethostid() BIND(C)
              use iso_c_binding
              integer (C_INT) :: gethostid
            end function gethostid
          end interface

          integer :: nprocs, myrank
          integer, dimension(nprocs) :: hostids, localprocs
          integer :: hostid, ierr, numdev, mydev, i, numlocal
          integer :: setDevice

        ! get the hostids so we can determine what other processes are on this node
        hostid = gethostid()
        CALL mpi_allgather(hostid,1,MPI_INTEGER,hostids,1,MPI_INTEGER, &
                             MPI_COMM_WORLD,ierr)
        ! determine which process are are on this node
          numlocal=0
          localprocs=0
          do i=1,nprocs
            if (hostid .eq. hostids(i)) then
              localprocs(i)=numlocal
              numlocal = numlocal+1
            endif
          enddo

        ! get the number of devices on this node
          ierr = cudaGetDeviceCount(numdev)
          print*,"the number of devices on my node is ", numdev

          if (numdev .lt. 1) then
            print *, 'ERROR:no devices available on this host.  &
                      ABORTING.', myrank
            stop
          endif

        ! print a warning if the number of devices is less then the number
        ! of processes on this node.  Having multiple processes share devices is not   
        ! recommended.
          if (numdev .lt. numlocal) then
           if (localprocs(myrank+1).eq.1) then
             ! print the message only once per node
           print *, 'WARNING:# of process is greater then the number  &
                     of GPUs.', myrank
           endif
           mydev = mod(localprocs(myrank+1),numdev)
          else
           mydev = localprocs(myrank+1)
          endif

         ierr = cudaSetDevice(mydev)
         setDevice = mydev

        end function setDevice


Compiled with: mpif90 -acc -o testmpiv5 testmpiv5.F90
Job sub with: qsub -I -V -q dirac_reg -l walltime=10:00 -l nodes=2:ppn=1
Run with: mpirun -np 2 ./testmpiv5
And modules: pgi/12.3, pgi-gpu/12.3
Code:
! GPU assignment done by setDevice function utilizing OACC
! fails to detect GPUs across multiple nodes
      program testmpiv5
      include "mpif.h"

      integer ierr, myid,numprocs
      #ifdef _OPENACC
        integer devnum, setDevice
      #endif

      call MPI_INIT(ierr)

      call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
      call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)

      #ifdef _OPENACC
        devnum=setDevice(numprocs, myid)
      #endif
      print *, "my dev is ",devnum

      call MPI_FINALIZE(ierr)
      end

!cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
! Mat's setDevice function
!cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc
        #ifdef _OPENACC

        function setDevice(nprocs,myrank)

          use iso_c_binding
          use openacc
          implicit none
          include "mpif.h"
          interface
            function gethostid() BIND(C)
              use iso_c_binding
              integer (C_INT) :: gethostid
            end function gethostid
          end interface

          integer :: nprocs, myrank
          integer, dimension(nprocs) :: hostids, localprocs
          integer :: hostid, ierr, numdev, mydev, i, numlocal
          integer :: setDevice

        ! get the hostids so we can determine what other processes are on this node
        hostid = gethostid()
        CALL mpi_allgather(hostid,1,MPI_INTEGER,hostids,1,MPI_INTEGER, &
                             MPI_COMM_WORLD,ierr)
        ! determine which process are are on this node
          numlocal=0
          localprocs=0
          do i=1,nprocs
            if (hostid .eq. hostids(i)) then
              localprocs(i)=numlocal
              numlocal = numlocal+1
            endif
          enddo

        ! get the number of devices on this node
          numdev = acc_get_num_devices(ACC_DEVICE_NVIDIA)
          print*,"the number of devices on my node is ", numdev

          if (numdev .lt. 1) then
            print *, 'ERROR:no devices available on this host.  &
                      ABORTING.', myrank
            stop
          endif

        ! print a warning if the number of devices is less then the number
        ! of processes on this node.  Having multiple processes share devices is not   
        ! recommended.
          if (numdev .lt. numlocal) then
           if (localprocs(myrank+1).eq.1) then
             ! print the message only once per node
           print *, 'WARNING:# of process is greater then the number  &
                     of GPUs.', myrank
           endif
           mydev = mod(localprocs(myrank+1),numdev)
          else
           mydev = localprocs(myrank+1)
          endif

         call acc_set_device_num(mydev,ACC_DEVICE_NVIDIA)
         call acc_init(ACC_DEVICE_NVIDIA)
         setDevice = mydev

        end function setDevice
        #endif
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6473
Location: The Portland Group Inc.

PostPosted: Fri Jul 12, 2013 2:30 pm    Post subject: Reply with quote

Hi Ben,

I'll get on Dirac later today, but it might be just because you're using 12.3. Can you try again with 12.9? OpenACC wasn't fully supported until 12.6.

- Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6473
Location: The Portland Group Inc.

PostPosted: Fri Jul 12, 2013 4:18 pm    Post subject: Reply with quote

Hi Ben,

Don't you need to add ":fermi" or ":tesla" to the end of your qsub command to get a GPU? (See: http://www.nersc.gov/users/computational-systems/dirac/running-jobs/batch/)

While I did this in interactive mode, "qsub -I -V -q dirac_int -l nodes=1:ppn=8:fermi", the command worked fine with 12.3.

- Mat

[@dirac41 ~/tests]$ mpif90 -V

pgf90 12.3-0 64-bit target on x86-64 Linux -tp nehalem
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2012, STMicroelectronics, Inc. All Rights Reserved.
[@dirac41 ~/tests]$ mpif90 -acc -Minfo=accel setdevice.F90
[@dirac41 ~/tests]$ mpirun -np 1 ./a.out
the number of devices on my node is 1
my dev is 0
[@dirac41 ~/tests]$ module list
Currently Loaded Modulefiles:
1) modules 3) moab/7.2.3-r11-b103 5) pgi-gpu/12.3 7) altd/1.0
2) nsg/1.2.0 4) torque/4.2.3.1 6) openmpi/1.4.5 8) usg-default-modules/1.0
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group