PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

simple multi-gpu test program not working

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
brush



Joined: 26 Jun 2012
Posts: 44

PostPosted: Wed Jun 12, 2013 12:09 pm    Post subject: simple multi-gpu test program not working Reply with quote

I wrote a short fortran program using OMP + ACC. All it does it set a(i)=i in parallel, so the array would read 1, 2, 3, 4...

The entire code and my compile command:
pgfortran -Minfo -mp -acc -o test test.f
Code:
      program test
      use OMP_LIB

      integer myid,i,N,chunk
      integer a(1:100)

      N = size(a)
      chunk=N/2       ! hardcoded for 2 OMP threads

      call omp_set_num_threads(2)

!$OMP PARALLEL PRIVATE(myid) SHARED(a)
      myid = OMP_GET_THREAD_NUM()
      call acc_set_device_num(myid,acc_device_nvidia)

!$acc kernels do
      do i=myid*chunk+1,myid*chunk+chunk   ! 0th thread does first half
         a(i)=i
      enddo
!$OMP END PARALLEL

      end


At first I thought it was working correctly, because the array a has the expected values and the compiler output seemed okay. However, setting PGI_ACC_TIME=1 shows:
Code:
Accelerator Kernel Timing data
/home/ben/scratch/test.f
  test  thread=0  NVIDIA  devicenum=0
    time(us): 84
    16: compute region reached 1 time
        17: kernel launched 2 times
            grid: [1]  block: [64]
             device time(us): total=22 max=16 min=6 avg=11
            elapsed time(us): total=350 max=327 min=23 avg=175
        20: data copyout reached 2 times
             device time(us): total=62 max=43 min=0 avg=31
/home/ben/scratch/test.f
  test  thread=1  NVIDIA  devicenum=0
    time(us): 0
    16: compute region reached 1 time


or occasionally:
Code:
Accelerator Kernel Timing data
/home/ben/scratch/test.f
  test  thread=0  NVIDIA  devicenum=0
    time(us): 55
    16: compute region reached 1 time
        17: kernel launched 1 time
            grid: [1]  block: [64]
             device time(us): total=32 max=32 min=32 avg=32
            elapsed time(us): total=49 max=49 min=49 avg=49
        20: data copyout reached 1 time
             device time(us): total=23 max=23 min=23 avg=23
/home/ben/scratch/test.f
  test  thread=1  NVIDIA  devicenum=0
    time(us): 22
    16: compute region reached 1 time
        17: kernel launched 1 time
            grid: [1]  block: [64]
             device time(us): total=11 max=11 min=11 avg=11
            elapsed time(us): total=18 max=18 min=18 avg=18
        20: data copyout reached 1 time
             device time(us): total=11 max=11 min=11 avg=11



So it seems I am only using one of the two GPUs, since both threads show devicenum=0.

Compiler accelerator info:
Code:
test:
     12, Parallel region activated
     16, Generating present_or_copyout(a(myid*50+1:myid*50+50))
         Generating NVIDIA code
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
         Generating compute capability 3.0 binary
     17, Loop is parallelizable
         Accelerator kernel generated
         17, !$acc loop gang, vector(64) ! blockidx%x threadidx%x
     20, Parallel region terminated


Any idea on why I am using only 1 GPU (apparently)?
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Thu Jun 13, 2013 4:25 am    Post subject: Reply with quote

The answer is simple, yet odd to me as to why it didn't throw an error or warning. Try adding "use openacc" at the top of your code.

Without it (and with PGI_ACC_NOTIFY=3):
Code:
$ ./test
launch CUDA kernel  file=/home/mathomp4/F90Files/OMP-ACC/test.f function=test line=18 device=0 grid=1 block=64
download CUDA data  file=/home/mathomp4/F90Files/OMP-ACC/test.f function=test line=21 device=0 variable=a bytes=400
launch CUDA kernel  file=/home/mathomp4/F90Files/OMP-ACC/test.f function=test line=18 device=0 grid=1 block=64
download CUDA data  file=/home/mathomp4/F90Files/OMP-ACC/test.f function=test line=21 device=0 variable=a bytes=400

Accelerator Kernel Timing data
/home/mathomp4/F90Files/OMP-ACC/test.f
  test  thread=0  NVIDIA  devicenum=0
    time(us): 99
    17: compute region reached 1 time
        18: kernel launched 1 time
            grid: [1]  block: [64]
             device time(us): total=43 max=43 min=43 avg=43
            elapsed time(us): total=60 max=60 min=60 avg=60
        21: data copyout reached 1 time
             device time(us): total=56 max=56 min=56 avg=56
/home/mathomp4/F90Files/OMP-ACC/test.f
  test  thread=1  NVIDIA  devicenum=0
    time(us): 110
    17: compute region reached 1 time
        18: kernel launched 1 time
            grid: [1]  block: [64]
             device time(us): total=76 max=76 min=76 avg=76
            elapsed time(us): total=93 max=93 min=93 avg=93
        21: data copyout reached 1 time
             device time(us): total=34 max=34 min=34 avg=34

With 'use openacc':
Code:
$ ./test
launch CUDA kernel  file=/home/mathomp4/F90Files/OMP-ACC/test.f function=test line=18 device=1 grid=1 block=64
download CUDA data  file=/home/mathomp4/F90Files/OMP-ACC/test.f function=test line=21 device=1 variable=a bytes=400
launch CUDA kernel  file=/home/mathomp4/F90Files/OMP-ACC/test.f function=test line=18 device=0 grid=1 block=64
download CUDA data  file=/home/mathomp4/F90Files/OMP-ACC/test.f function=test line=21 device=0 variable=a bytes=400

Accelerator Kernel Timing data
/home/mathomp4/F90Files/OMP-ACC/test.f
  test  thread=0  NVIDIA  devicenum=0
    time(us): 89
    17: compute region reached 1 time
        18: kernel launched 1 time
            grid: [1]  block: [64]
             device time(us): total=46 max=46 min=46 avg=46
            elapsed time(us): total=64 max=64 min=64 avg=64
        21: data copyout reached 1 time
             device time(us): total=43 max=43 min=43 avg=43
/home/mathomp4/F90Files/OMP-ACC/test.f
  test  thread=1  NVIDIA  devicenum=1
    time(us): 80
    17: compute region reached 1 time
        18: kernel launched 1 time
            grid: [1]  block: [64]
             device time(us): total=45 max=45 min=45 avg=45
            elapsed time(us): total=62 max=62 min=62 avg=62
        21: data copyout reached 1 time
             device time(us): total=35 max=35 min=35 avg=35


I guess my question now is, what is the "correct" behavior of a program like this? Without 'use openacc' it definitely compiled and ran, just not as expected. If 'use openacc' is necessary for the program to run correctly, shouldn't the compiler warn/error? Or is it running "correctly" in each case and is it caveat programmer?

Matt
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Thu Jun 13, 2013 5:52 pm    Post subject: Reply with quote

Hi Matt, Ben,

Quote:
Or is it running "correctly" in each case and is it caveat programmer?

Blame Fortran implicit typing. Without "use openacc", the variable "acc_device_nvidia" is implicity declared as a real but has an undefined value. Perfectly legal Fortran code, just wrong. Adding "implicit none" would have found this problem.

Code:
% pgf90 test1.f90
PGF90-S-0038-Symbol, acc_device_nvidia, has not been explicitly declared (test1.f90)
  0 inform,   0 warnings,   1 severes, 0 fatal for test


- Mat
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Fri Jun 14, 2013 7:15 am    Post subject: Reply with quote

Ah. Of course. I don't deal with .f's that often and my fingers type 'implicit none' by default now.

Thanks, Mat.

Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Fri Jun 14, 2013 8:07 am    Post subject: Reply with quote

Quote:
Thanks, Mat.

Mat
Ha! I've converted you to spelling your name with one T!
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group