PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Error with Sample tdot Program with PGI 13.5

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Thu Jun 13, 2013 4:40 am    Post subject: Error with Sample tdot Program with PGI 13.5 Reply with quote

In looking at and trying to answer the question in this post, I decided to try out the 'tdot' program on page 16 of the PGI OpenACC Users Guide. I faithfully copy-and-pasted the code into tman.f90, built as instructed (using blas, not ACML) and then:
Code:
$ pgfortran -mp -acc tman.f90 -Minfo -lblas
tdot:
     35, Parallel region activated
     37, Parallel region terminated
     49, Parallel region activated
     52, Generating copyin(y(offs+1:nsec+offs))
         Generating copyin(x(offs+1:nsec+offs))
         Generating NVIDIA code
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
         Generating compute capability 3.0 binary
     53, Loop is parallelizable
         Accelerator kernel generated
         53, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     54, Sum reduction generated for z
     56, Parallel region terminated
     59, sum reduction inlined
(821) $ ./a.out
 Host Serial    2489.612915315796     
upload CUDA data  file=/home/mathomp4/F90Files/OMP-ACC/tman.f90 function=tdot line=52 device=1 variable=y bytes=40000
upload CUDA data  file=/home/mathomp4/F90Files/OMP-ACC/tman.f90 function=tdot line=52 device=1 variable=x bytes=40000
upload CUDA data  file=/home/mathomp4/F90Files/OMP-ACC/tman.f90 function=tdot line=52 device=0 variable=y bytes=40000
upload CUDA data  file=/home/mathomp4/F90Files/OMP-ACC/tman.f90 function=tdot line=52 device=0 variable=x bytes=40000
launch CUDA kernel  file=/home/mathomp4/F90Files/OMP-ACC/tman.f90 function=tdot line=53 device=1 grid=40 block=128 sharedbytes=2048
launch CUDA kernel  file=/home/mathomp4/F90Files/OMP-ACC/tman.f90 function=tdot line=53 device=0 grid=40 block=128 sharedbytes=2048
call to cuEventSynchronize returned error 700: Launch failed

Accelerator Kernel Timing data
/home/mathomp4/F90Files/OMP-ACC/tman.f90
  tdot  thread=0  NVIDIA  devicenum=0
    time(us): 238
    52: compute region reached 1 time
        52: data copyin reached 4 times
             device time(us): total=238 max=72 min=45 avg=59
        53: kernel launched 2 times
            grid: [40]  block: [128]
             device time(us): total=0 max=0 min=0 avg=0
/home/mathomp4/F90Files/OMP-ACC/tman.f90
  tdot  thread=1  NVIDIA  devicenum=1
    time(us): 0
    52: compute region reached 1 time
call to cuEventSynchronize returned error 700: Launch failed

This was run with PGI 13.5. So, I thought, well let's try 13.4 (as that seems to be what Mat always does when I pass him errors :) :
Code:
pgfortran -V13.4 -mp -acc tman.f90 -Minfo -lblas
tdot:
     35, Parallel region activated
     37, Parallel region terminated
     49, Parallel region activated
     52, Generating copyin(y(offs+1:nsec+offs))
         Generating copyin(x(offs+1:nsec+offs))
         Generating NVIDIA code
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
         Generating compute capability 3.0 binary
     53, Loop is parallelizable
         Accelerator kernel generated
         53, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
     54, Sum reduction generated for z
     56, Parallel region terminated
     59, sum reduction inlined
(840) $ ./a.out
 Host Serial    2489.612915315796     
 Multi-Device Parallel    2489.612915315794     

Accelerator Kernel Timing data
/home/mathomp4/F90Files/OMP-ACC/tman.f90
  tdot  NVIDIA  devicenum=0
        time(us): 89
        52: data copyin reached 2 times
             device time(us): total=57 max=32 min=25 avg=28
        53: kernel launched 1 times
            grid: [40]  block: [128]
             device time(us): total=24 max=24 min=24 avg=24
            elapsed time(us): total=39 max=39 min=39 avg=39
        53: reduction kernel launched 1 times
            grid: [1]  block: [256]
             device time(us): total=8 max=8 min=8 avg=8
            elapsed time(us): total=20 max=20 min=20 avg=20
/home/mathomp4/F90Files/OMP-ACC/tman.f90
  tdot  NVIDIA  devicenum=1
        time(us): 71
        52: data copyin reached 2 times
             device time(us): total=49 max=25 min=24 avg=24
        53: kernel launched 1 times
            grid: [40]  block: [128]
             device time(us): total=14 max=14 min=14 avg=14
            elapsed time(us): total=28 max=28 min=28 avg=28
        53: reduction kernel launched 1 times
            grid: [1]  block: [256]
             device time(us): total=8 max=8 min=8 avg=8
            elapsed time(us): total=19 max=19 min=19 avg=19

So, any idea what I did wrong? Do I have some odd control character in my code I can't see from the cut-and-paste throwing this off? Did the ACC standard change between 13.4 and 13.5?

Matt
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6206
Location: The Portland Group Inc.

PostPosted: Fri Jun 14, 2013 11:30 am    Post subject: Reply with quote

Hi Matt,

Quote:
So, any idea what I did wrong? Do I have some odd control character in my code I can't see from the cut-and-paste throwing this off? Did the ACC standard change between 13.4 and 13.5?
No, this looks like a compiler error having to do multi-device support that starting to be added. If I compile with "-ta=nvidia" the test seems to work.

I've added TPR#19419 to address the problem.

Thanks,
Mat
Back to top
View user's profile
jtull



Joined: 30 Jun 2004
Posts: 445

PostPosted: Fri Nov 01, 2013 4:00 pm    Post subject: 19419 - OACC: tdot example from OpenACC User's guide fails i Reply with quote

Matt,

TPR 19419 has been fixed in the 13.10 release.

thanks,
dave
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group