PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Acc unroll bug for pgi 12.X ... 13.4 , ok for pgi 11.10

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
escj



Joined: 30 Sep 2009
Posts: 63
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Fri Apr 19, 2013 6:46 am    Post subject: Acc unroll bug for pgi 12.X ... 13.4 , ok for pgi 11.10 Reply with quote

Hello ( ;-) Same player ... )

The !$acc ... unroll(X) produce wrong result with pgi]/12.X ... pgi 13.4 ,

OK with pgi/11.10

I've discover it with the simple stream benchmark very my new Titan card
deliver 337BG/sec over the 288BG/sec possible !

The simple code comparing the same computation done on host versus device with unroll(8)

The code
Code:

PROGRAM TEST_UNROLL

  IMPLICIT NONE

  INTEGER,PARAMETER :: n = 1024*1024
  REAL, DIMENSION(n) :: ad,bd,cd,ah,bh,ch

  INTEGER,PARAMETER :: NUNROLL = 8
 
  INTEGER :: i

! Init host & device array in the same way
  do i=1,n
     ah(i) = 0.0 ;  bh(i) = 0.5  * i*i ; ch(i) = 0.25 * i*i*i
  end do

  ad(:) = ah(:) ;  bd(:) = bh(:) ; cd(:) = ch(:)
 
! Host part
  do i=1,n
     ah(i) =  bh(i) +  ch(i)
  end do

! Device part with unrolling
  ! acc kernels loop gang, vector unroll(NUNROLL)
  !$acc region do parallel unroll(NUNROLL)
  do i=1,n
     ad(i) =  bd(i) +  cd(i)
  end do

  print*, "ERR(Device - Host) =", ad(n) - ah(n) ; call flush(6)
         
END PROGRAM TEST_UNROLL


Compilation pgi11.10
Code:

pgf90 --version -Minfo=acc -ta=host,nvidia,keepgpu test_unroll.f90 -o test_unroll_pgi11.10 2>&1 | egrep 'target|unroll'
test_unroll:
         27, !$acc do parallel unroll(8), vector(256) ! blockidx%x threadidx%x
pgf90 11.10-0 64-bit target on x86-64 Linux -tp nehalem



Execution pgi11.10
Code:

 test_unroll_pgi11.10
 ERR(Device - Host) =    0.000000


Compilation with pgi 12.X until 13.4
Code:

pgf90 --version -Minfo=acc -ta=host,nvidia,keepgpu test_unroll.f90 -o test_unroll_pgi13.04 2>&1 | egrep 'target|unroll'
test_unroll:
         27, !$acc loop gang unroll(8), vector(128) ! blockidx%x threadidx%x
pgf90 13.4-0 64-bit target on x86-64 Linux -tp nehalem


Execution pgi13.04
Code:

 test_unroll_pgi13.04
 ERR(Device - Host) =  -2.8823089E+17


A+
Juan

PS1 : the bug is the same with pgi or OpenACC directives
PS2 : looking at the gpu code the compiler forgot to increment the pointer to the ad bd & cd array
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6128
Location: The Portland Group Inc.

PostPosted: Fri Apr 19, 2013 8:29 am    Post subject: Reply with quote

Thanks escj. I've added a problem report (TRS#19298) and sent it on to engineering.

- Mat
Back to top
View user's profile
jtull



Joined: 30 Jun 2004
Posts: 438

PostPosted: Fri Jun 07, 2013 4:11 pm    Post subject: TPR 19298 has been corrected in the 13.6 release Reply with quote

Thank you.

dave
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group