escj
Joined: 30 Sep 2009 Posts: 37 Location: Laboratoire d'Aérologie, Toulouse, FRANCE
|
Posted: Fri Apr 19, 2013 6:46 am Post subject: Acc unroll bug for pgi 12.X ... 13.4 , ok for pgi 11.10 |
|
|
Hello ( ;-) Same player ... )
The !$acc ... unroll(X) produce wrong result with pgi]/12.X ... pgi 13.4 ,
OK with pgi/11.10
I've discover it with the simple stream benchmark very my new Titan card
deliver 337BG/sec over the 288BG/sec possible !
The simple code comparing the same computation done on host versus device with unroll(8)
The code
| Code: |
PROGRAM TEST_UNROLL
IMPLICIT NONE
INTEGER,PARAMETER :: n = 1024*1024
REAL, DIMENSION(n) :: ad,bd,cd,ah,bh,ch
INTEGER,PARAMETER :: NUNROLL = 8
INTEGER :: i
! Init host & device array in the same way
do i=1,n
ah(i) = 0.0 ; bh(i) = 0.5 * i*i ; ch(i) = 0.25 * i*i*i
end do
ad(:) = ah(:) ; bd(:) = bh(:) ; cd(:) = ch(:)
! Host part
do i=1,n
ah(i) = bh(i) + ch(i)
end do
! Device part with unrolling
! acc kernels loop gang, vector unroll(NUNROLL)
!$acc region do parallel unroll(NUNROLL)
do i=1,n
ad(i) = bd(i) + cd(i)
end do
print*, "ERR(Device - Host) =", ad(n) - ah(n) ; call flush(6)
END PROGRAM TEST_UNROLL |
Compilation pgi11.10
| Code: |
pgf90 --version -Minfo=acc -ta=host,nvidia,keepgpu test_unroll.f90 -o test_unroll_pgi11.10 2>&1 | egrep 'target|unroll'
test_unroll:
27, !$acc do parallel unroll(8), vector(256) ! blockidx%x threadidx%x
pgf90 11.10-0 64-bit target on x86-64 Linux -tp nehalem
|
Execution pgi11.10
| Code: |
test_unroll_pgi11.10
ERR(Device - Host) = 0.000000
|
Compilation with pgi 12.X until 13.4
| Code: |
pgf90 --version -Minfo=acc -ta=host,nvidia,keepgpu test_unroll.f90 -o test_unroll_pgi13.04 2>&1 | egrep 'target|unroll'
test_unroll:
27, !$acc loop gang unroll(8), vector(128) ! blockidx%x threadidx%x
pgf90 13.4-0 64-bit target on x86-64 Linux -tp nehalem
|
Execution pgi13.04
| Code: |
test_unroll_pgi13.04
ERR(Device - Host) = -2.8823089E+17
|
A+
Juan
PS1 : the bug is the same with pgi or OpenACC directives
PS2 : looking at the gpu code the compiler forgot to increment the pointer to the ad bd & cd array |
|