PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

where am I going wrong?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
cablesb



Joined: 21 Jan 2010
Posts: 33

PostPosted: Thu Sep 27, 2012 4:29 pm    Post subject: where am I going wrong? Reply with quote

I am trying to run the first Fortran program in PGI insider, June 2009. I am getting some weird results. Probably something stupid I did, but I just cant' see it. So, with your indulgence, my code is:

Code:
program dbl_it

integer :: n, i
real,dimension(:),allocatable :: a, r, e
character(10) :: arg1

if (iargc() > 0) then
  call getarg(1,arg1)
  read(arg1,'(i10)') n
else
  n=100000
endif

allocate(a(n),r(n),e(n))

do i=1,n
  a(i)=2.*i
enddo

!$acc region
do i=1,n
  r(i)=2.*a(i)
enddo
!$acc end region

do i=1,n
  e(i)=2.*a(i)
enddo

do i=1,n
  if (r(i) /= e(i)) then
    print *, i,r(i),e(i)
    stop 'error found'
  endif
enddo

print *, n,'iterations completed'

end program


Note the default for "n" is 100000. As long as I keep n less than or equal to 100000, everything is OK:

Code:
[CUDA]$ ./a.out
       100000 iterations completed
[CUDA]$ ./a.out 50000
        50000 iterations completed


But if I go beyond 100000, woe is me:

Code:
[CUDA]$ ./a.out 100001
       100001    0.000000        400004.0   
Warning: ieee_inexact is signaling
error found


It looks like somehow the GPU is picking up on that 100000 default value and only populating the array "r" up to 100000.

If I change the default to, say, 20000, then the GPU populates "r" only up to 20000. I put in a bounch of diagnostic output to look at n; n alwys outputs OK. But the GPU won't go beyond the default value.

Many apologies if I am doing something obviously wrong. I have looked at this over and over and I can't see it. Thanks.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Fri Sep 28, 2012 10:09 am    Post subject: Reply with quote

Hi Cablesb,

This looks like a compiler error to me. If you look at the -Minfo output you'll see that the compiler is using the default value of "100000" in the copy:
Code:
% pgf90 -ta=nvidia -Minfo=accel test.f90
dbl_it:
     23, Generating present_or_copyin(a(1:100000))
         Generating present_or_copyout(r(1:100000))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     24, Loop is parallelizable
         Accelerator kernel generated
         24, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
             CC 1.0 : 10 registers; 48 shared, 0 constant, 0 local memory bytes
             CC 2.0 : 14 registers; 0 shared, 64 constant, 0 local memory bytes


In the original code, the compiler correctly uses "n" for the size:
Code:
% pgf90 -ta=nvidia -Minfo=accel test2.f90
main:
     21, Generating present_or_copyin(a(1:n))
         Generating present_or_copyout(r(1:n))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     22, Loop is parallelizable
         Accelerator kernel generated
         22, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
             CC 1.0 : 10 registers; 52 shared, 0 constant, 0 local memory bytes
             CC 2.0 : 14 registers; 0 shared, 68 constant, 0 local memory bytes


The main difference in the two codes is the "if ( n .le. 0 ) n = 100000" in the original. If you add this if statement or add copy clauses to specifically set the size, this will work around the problem.

Code:
% cat test.f90
program dbl_it

integer :: n,i
real,dimension(:),allocatable :: a, r, e
character(10) :: arg1

if (iargc() > 0) then
  call getarg(1,arg1)
  read(arg1,'(i10)') n
else
  n=100000
endif

! ADD THIS
if( n .le. 0 ) n = 100000

allocate(a(n),r(n),e(n))

do i=1,n
  a(i)=2.*i
enddo

! OR ADD THIS
!$acc region copyin(a), copyout(r)
do i=1,n
  r(i)=2.*a(i)
enddo
!$acc end region

do i=1,n
  e(i)=2.*a(i)
enddo

do i=1,n
  if (r(i) /= e(i)) then
    print *, i,r(i),e(i)
    stop 'error found'
  endif
enddo

print *, n,'iterations completed'

end program   
% pgf90 -ta=nvidia -Minfo=accel test.f90
dbl_it:
     24, Generating present_or_copyout(r(:))
         Generating present_or_copyin(a(:))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     25, Loop is parallelizable
         Accelerator kernel generated
         25, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
             CC 1.0 : 10 registers; 64 shared, 0 constant, 0 local memory bytes
             CC 2.0 : 14 registers; 0 shared, 80 constant, 0 local memory bytes
% a.out 1000000
      1000000 iterations completed


I've submitted TPR#18950 to engineering for further investigation. Note that this is a zero-day bug that occurs in every compiler version. Thanks for finding it!

- Mat
Back to top
View user's profile
cablesb



Joined: 21 Jan 2010
Posts: 33

PostPosted: Fri Sep 28, 2012 1:24 pm    Post subject: Reply with quote

So it wasn't me??? Wow. That's a first! :) Anyway, thanks for the tip, and glad to be of service.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group