PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Problem with negative increment using PGI 12.3

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
xlapillonne



Joined: 16 Feb 2011
Posts: 69

PostPosted: Wed May 02, 2012 8:50 am    Post subject: Problem with negative increment using PGI 12.3 Reply with quote

Hi,

I am seeing some differences beyond what can be expected from rounding error with an acc code. I have been able to isolate one of the Kernel and to reproduce the problem in a small test code:

Code:

program main
  implicit none
  integer*4 :: N,ke_soil,i,istarts,iends,kso
  real*8, allocatable :: zagc(:,:), zagd(:,:), zage(:,:), t_so_new(:,:)
  logical, allocatable :: llandmask(:)

  N=1E3
  ke_soil=4
  istarts=1
  iends=N
allocate(llandmask(N),zagc(N,0:ke_soil), zagd(N,0:ke_soil), zage(N,0:ke_soil+1))
allocate(t_so_new(N,0:ke_soil))

!init

DO i=istarts, iends
   llandmask(i)=.T.
   zage(i,ke_soil+1)=290.0*(1.0 + 1.0/(i))
END DO

DO kso=0,ke_soil
   DO i=istarts, iends
      zage(i,kso)=(1.0 + 1.0/(i+kso))
      zagd(i,kso)=290.0*(1.0 + 1.0/(i+kso))
      zagc(i,kso)=(1.0 + 1.0/(i+kso))/288.0
   END DO
END DO


 
  !$acc region do seq
  DO kso = ke_soil,0,-1
     !$acc do kernel
      DO i = istarts, iends
        IF (llandmask(i)) THEN          ! land-points only
          zage(i,kso)     = zagd(i,kso) - zagc(i,kso)*zage(i,kso+1)
        ! The surface temperature computed by t_so(i,0,nnew)=zage(i,0) is
        ! presently unused
          t_so_new(i,kso) = zage(i,kso)
        END IF          ! land-points only
      END DO
  END DO                ! soil layers
  !$acc end region



  DO kso=0,ke_soil
     print*, 'kso, mean(t_so_new)', kso, sum(t_so_new(:,kso))/N
  END DO


end program main



If I compile with:
-ta=host

I get:
> aprun -n 1 ./test_k_neg_loop
kso, mean(t_so_new) 0 291.1523301069119
kso, mean(t_so_new) 1 290.8646212574333
kso, mean(t_so_new) 2 290.7209103217591
kso, mean(t_so_new) 3 290.6252217333774
kso, mean(t_so_new) 4 290.5473285022244

wit -ta=nvidia
> aprun -N 1 ./test_k_neg_loop
kso, mean(t_so_new) 0 292.1672628741607
kso, mean(t_so_new) 1 291.8775595150368
kso, mean(t_so_new) 2 291.7328524090394
kso, mean(t_so_new) 3 291.6364771839715
kso, mean(t_so_new) 4 290.5473285022244


I actually don't know whether the issue is coming from the negative incrememnt in the k loop or from the array starting at 0.

Thanks for your help,

Xavier
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Wed May 02, 2012 10:48 am    Post subject: Reply with quote

Hi Xavier,

It looks like the compiler is caching zage in [256x2] blocks and which is most likely causing data a synchronization issue. The work around would be to either disable caching, "-ta=nvidia,nocache", or invert your loops.
Code:
  !$acc region do kernel
  DO i = istarts, iends
    DO kso = ke_soil,0,-1


Code:
% pgf90 -ta=nvidia,time neg.f90 -V12.4 -Minfo
main:
     32, Generating copyin(zagd(1:1000,0:4))
         Generating copy(zage(1:1000,0:5))
         Generating copyin(zagc(1:1000,0:4))
         Generating copy(t_so_new(1:1000,0:4))
         Generating copyin(llandmask(1:1000))
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     33, Loop carried dependence of 'zage' prevents parallelization
         Loop carried backward dependence of 'zage' prevents vectorization
     35, Loop is parallelizable
         Accelerator kernel generated
         33, Cached references to size [256x2] block of 'zage'
         35, !$acc do parallel, vector(256) ! blockidx%x threadidx%x
             CC 1.3 : 10 registers; 4172 shared, 16 constant, 0 local memory bytes; 75% occupancy
             CC 2.0 : 15 registers; 4116 shared, 72 constant, 0 local memory bytes; 100% occupancy
     49, sum reduction inlined
% a.out
 kso, mean(t_so_new)            0    292.1672628741607     
 kso, mean(t_so_new)            1    291.8775595150368     
 kso, mean(t_so_new)            2    291.7328524090394     
 kso, mean(t_so_new)            3    291.6364771839715     
 kso, mean(t_so_new)            4    290.5473285022244     

% pgf90 -ta=nvidia,nocache,time neg.f90 -V12.4 -Minfo
main:
     32, Generating copyin(zagd(1:1000,0:4))
         Generating copy(zage(1:1000,0:5))
         Generating copyin(zagc(1:1000,0:4))
         Generating copy(t_so_new(1:1000,0:4))
         Generating copyin(llandmask(1:1000))
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     33, Loop carried dependence of 'zage' prevents parallelization
         Loop carried backward dependence of 'zage' prevents vectorization
     35, Loop is parallelizable
         Accelerator kernel generated
         33, !$acc do seq
         35, !$acc do parallel, vector(256) ! blockidx%x threadidx%x
             CC 1.3 : 11 registers; 60 shared, 8 constant, 0 local memory bytes; 100% occupancy
             CC 2.0 : 20 registers; 4 shared, 72 constant, 0 local memory bytes; 100% occupancy
     49, sum reduction inlined
% a.out
 kso, mean(t_so_new)            0    291.1523301069119     
 kso, mean(t_so_new)            1    290.8646212574333     
 kso, mean(t_so_new)            2    290.7209103217591     
 kso, mean(t_so_new)            3    290.6252217333774     
 kso, mean(t_so_new)            4    290.5473285022244   


I've added TPR#18671 to track this issue.

Best Regards,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group