|
| View previous topic :: View next topic |
| Author |
Message |
escj
Joined: 30 Sep 2009 Posts: 37 Location: Laboratoire d'Aérologie, Toulouse, FRANCE
|
Posted: Fri Apr 05, 2013 8:09 am Post subject: pgi13.3 PB , mirror/host data asynchronously updated ?? |
|
|
Hello .
A strange/random problem with a simple case of management of Module data with mirror declaration and allocation via a module routine .
No problem with pgi/13.1
Problem with pgi/13.3
Test done on a GeForce GTX 470 with cuda5.0
After the update host the result must by 100.0 ( randomly some time the answer could be OK )
| Code: | test_update
BEFORE UPDATE AA= 0.000000
AFTER UPDATE AA= 0.000000 |
If run with PGI_ACC_SYNCHRONOUS=1, all is OK again ... but the code don't use any async directive !
| Code: |
PGI_ACC_SYNCHRONOUS=1 test_update
BEFORE UPDATE AA= 0.000000
AFTER UPDATE AA= 100.0000
|
The answer is also OK , if the allocate is done directly in the main program ( but not usable in a real application with hundred variable )
Compilation
| Code: | | pgf90 -ta=nvidia test_update.f90 -o test_update |
Source
| Code: |
MODULE MODD_DATA
IMPLICIT NONE
INTEGER, PARAMETER :: Nvec=2048
REAL , ALLOCATABLE, DIMENSION(:) :: AA
!$acc mirror(AA)
CONTAINS
SUBROUTINE ALLOC_DATA_MODULE()
IMPLICIT NONE
ALLOCATE( AA(Nvec) )
END SUBROUTINE ALLOC_DATA_MODULE
SUBROUTINE INIT_DATA(XTAB,XVAL)
IMPLICIT NONE
REAL , DIMENSION(:) :: XTAB
!$acc reflected (XTAB)
REAL :: XVAL
!$acc kernels
XTAB = XVAL
!$acc end kernels
END SUBROUTINE INIT_DATA
END MODULE MODD_DATA
PROGRAM TEST_ASYNC
USE MODD_DATA
IMPLICIT NONE
CALL ALLOC_DATA_MODULE()
!!$ ALLOCATE( AA(Nvec) )
CALL INIT_DATA(AA, 10.0 )
print*, "BEFORE UPDATE AA=" , AA(Nvec) ; call flush(6)
!$acc update host(AA)
AA(Nvec) = AA(Nvec) * 10.0
print*, "AFTER UPDATE AA=" , AA(Nvec) ; call flush(6)
END PROGRAM TEST_ASYNC
|
A+
Juan |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4999 Location: The Portland Group Inc.
|
Posted: Fri Apr 05, 2013 10:54 am Post subject: |
|
|
Hi Juan,
I think what's happening is that the update isn't completing before the variable is being used in the calculation. However, I'm only able to reproduce this issue in 13.2. 13.3 seems ok. Granted, I was getting intermittent correct answers with 13.2 until I initialize the host copy, so 13.3 may be a false positive as well, but I'd like you to please double check.
Thanks,
Mat
| Code: | % cat test_update.f90
MODULE MODD_DATA
IMPLICIT NONE
INTEGER, PARAMETER :: Nvec=2048
REAL , ALLOCATABLE, DIMENSION(:) :: AA
!$acc mirror(AA)
CONTAINS
SUBROUTINE ALLOC_DATA_MODULE()
IMPLICIT NONE
ALLOCATE( AA(Nvec) )
END SUBROUTINE ALLOC_DATA_MODULE
SUBROUTINE INIT_DATA(XTAB,XVAL)
IMPLICIT NONE
REAL , DIMENSION(:) :: XTAB
!$acc reflected (XTAB)
REAL :: XVAL
!$acc kernels
XTAB = XVAL
!$acc end kernels
END SUBROUTINE INIT_DATA
END MODULE MODD_DATA
PROGRAM TEST_ASYNC
USE MODD_DATA
IMPLICIT NONE
CALL ALLOC_DATA_MODULE()
!!$ ALLOCATE( AA(Nvec) )
CALL INIT_DATA(AA, 10.0 )
AA=10.0
!acc update host(AA) << Moving the host update here works around the issue
print*, "BEFORE UPDATE AA=" , AA(Nvec) ; call flush(6)
!$acc update host(AA)
!acc wait << This seems to also work around the issue in 13.2
AA(Nvec) = AA(Nvec) * 10.0
print*, "AFTER UPDATE AA=" , AA(Nvec) ; call flush(6)
END PROGRAM TEST_ASYNC
% pgf90 test_update.f90 -o test_update -V13.2 -ta=nvidia ; test_update
BEFORE UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
% pgf90 test_update.f90 -o test_update -V13.3 -ta=nvidia ; test_update
BEFORE UPDATE AA= 10.00000
AFTER UPDATE AA= 100.0000
|
|
|
| Back to top |
|
 |
escj
Joined: 30 Sep 2009 Posts: 37 Location: Laboratoire d'Aérologie, Toulouse, FRANCE
|
Posted: Mon Apr 08, 2013 6:45 am Post subject: |
|
|
Hello Mat .
Double Check ... it doesn't work correctly with pgi13.3 ...
Has you, before submitting the bug, I have tested the wait directive with no success ... Even with 2 or 3 wait after the update it doesn't work ...
I recheck the test, and trying your suggestion ...
... the only way to make it work correctly/reproductively is to put a double udpate host before using the value in the host part ?!
( initialization on the host or print have no/marginal effect )
| Code: |
!$acc update host(AA)
!$acc update host(AA)
AA(Nvec) = AA(Nvec) * 10.0
print*, "AFTER UPDATE AA=" , AA(Nvec) ; call flush(6)
|
Compilation with this double update :
| Code: |
pgf90 --version -ta=nvidia,cc13,cc20,cuda5.0 -Minfo=ccff,all,intensity -Mprof=ccff test_update_mat.f90 -o test_update_mat_133_dble_update 2>&1 | egrep "pgf90|update"
34, Generating update host(aa(:))
35, Generating update host(aa(:))
pgf90 13.3-0 64-bit target on x86-64 Linux -tp nehalem
|
Run 10 times test_update_mat_133_dble_update
| Code: |
for i in $( seq 10 ) ; do PGI_ACC_SYNCHRONOUS=0 test_update_mat_133_dble_update; done
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
AFTER UPDATE AA= 100.0000
|
Compilation with one update
| Code: |
pgf90 --version -ta=nvidia,cc13,cc20,cuda5.0 -Minfo=ccff,all,intensity -Mprof=ccff test_update_mat.f90 -o test_update_mat_133_one_update 2>&1 | egrep "pgf90|update"
35, Generating update host(aa(:))
pgf90 13.3-0 64-bit target on x86-64 Linux -tp nehalem
|
Run 10 times test_update_mat_133_one_update
| Code: |
for i in $( seq 10 ) ; do PGI_ACC_SYNCHRONOUS=0 test_update_mat_133_one_update; done
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
AFTER UPDATE AA= 10.00000
|
Another point to show that the pgi13.3 is in fault, the same executable produce a cuMemcpyDtoHAsync error in a previous generation of GPU on a GXT280
| Code: |
pgaccelinfo
CUDA Driver Version: 5000
NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.54 Sat Sep 29 00:05:49 PDT 2012
CUDA Device Number: 0
Device Name: GeForce GTX 280
Device Revision Number: 1.3
...
for i in $( seq 10 ) ; do PGI_ACC_SYNCHRONOUS=0 test_update_mat_133_one_update; done
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
|
No problem with the double update on this platform to ....
I post the simplified source code again to be sure we are doing the same test :
| Code: |
MODULE MODD_DATA
IMPLICIT NONE
INTEGER, PARAMETER :: Nvec=2048
REAL , ALLOCATABLE, DIMENSION(:) :: AA
!$acc mirror(AA)
CONTAINS
SUBROUTINE ALLOC_DATA_MODULE()
IMPLICIT NONE
ALLOCATE( AA(Nvec) )
END SUBROUTINE ALLOC_DATA_MODULE
SUBROUTINE INIT_DATA(XTAB,XVAL)
IMPLICIT NONE
REAL , DIMENSION(:) :: XTAB
!$acc reflected (XTAB)
REAL :: XVAL
!$acc kernels
XTAB = XVAL
!$acc end kernels
END SUBROUTINE INIT_DATA
END MODULE MODD_DATA
PROGRAM TEST_ASYNC
USE MODD_DATA
IMPLICIT NONE
CALL ALLOC_DATA_MODULE()
CALL INIT_DATA(AA, 10.0 )
!acc update host(AA)
!$acc update host(AA)
AA(Nvec) = AA(Nvec) * 10.0
print*, "AFTER UPDATE AA=" , AA(Nvec) ; call flush(6)
END PROGRAM TEST_ASYNC
|
A+
Juan |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4999 Location: The Portland Group Inc.
|
Posted: Mon Apr 08, 2013 10:56 am Post subject: |
|
|
Hi Juan,
I'm still not able to consistently reproduce this error leading me to believe that there is a UMR or synchronization issue going on. In 13.2 it fails only part of the time and in 13.3 it only fails for me when I compile to a generic CPU target (-tp=px-64). Though, that should be enough for our engineers to work with to determine the cause. I've added TPR#19269 to track this issue.
Another work around would be to move to using OpenACC data regions and the present clause:
| Code: | $ cat test_update_mat2.f90
MODULE MODD_DATA
IMPLICIT NONE
INTEGER, PARAMETER :: Nvec=2048
REAL , ALLOCATABLE, DIMENSION(:) :: AA
!acc mirror(AA)
CONTAINS
SUBROUTINE ALLOC_DATA_MODULE()
IMPLICIT NONE
ALLOCATE( AA(Nvec) )
END SUBROUTINE ALLOC_DATA_MODULE
SUBROUTINE INIT_DATA(XTAB,XVAL)
IMPLICIT NONE
REAL , DIMENSION(:) :: XTAB
!acc reflected (XTAB)
REAL :: XVAL
!$acc kernels present(XTAB)
XTAB = XVAL
!$acc end kernels
END SUBROUTINE INIT_DATA
END MODULE MODD_DATA
PROGRAM TEST_ASYNC
USE MODD_DATA
IMPLICIT NONE
CALL ALLOC_DATA_MODULE()
!$acc data create(AA)
CALL INIT_DATA(AA, 10.0 )
!acc update host(AA)
!$acc update host(AA)
AA(Nvec) = AA(Nvec) * 10.0
print*, "AFTER UPDATE AA=" , AA(Nvec) ; call flush(6)
!$acc end data
END PROGRAM TEST_ASYNC |
Best Regards,
Mat |
|
| Back to top |
|
 |
escj
Joined: 30 Sep 2009 Posts: 37 Location: Laboratoire d'Aérologie, Toulouse, FRANCE
|
Posted: Wed Apr 10, 2013 4:15 am Post subject: |
|
|
Hello Mat , thank you for the bug report ...
;-) The create(AA, ... ) directive is not really applicable in our code, it will become a directive spanning over 100 pages !
It's definitively a 'Warm-up/Foo' problem ...
With my previous version with mirror data ( ;-) waiting for device_resident ... )
just adding an unused foo mirrored array a the beging of the main program solve the issue .
| Code: |
PROGRAM TEST_ASYNC
USE MODD_DATA
IMPLICIT NONE
INTEGER, ALLOCATABLE, DIMENSION(:) :: ifoo
!$acc mirror(ifoo)
ALLOCATE (ifoo(1))
...
|
;-) So now Next Level ...
In my Original unit test the goal was to have a generic ALLOC_DATA_MODULE(), with an argument (XTAB) callable with data mirrored in data module=AA , something like this :
| Code: |
SUBROUTINE ALLOC_DATA_MODULE(XTAB)
IMPLICIT NONE
REAL , ALLOCATABLE, DIMENSION(:) :: XTAB
!$acc declare mirror (XTAB)
ALLOCATE( XTAB(Nvec) )
END SUBROUTINE ALLOC_DATA_MODULE
|
This doesn't work , because the device part is deallocated at the exit of the alloc_data routine
| Code: |
PGI_ACC_DEBUG=3 test_update_generic 2>&1 | egrep -n 'mirror.*xtab'
19:pgi_uacc_mirror_alloc(size=2048,elemsize=4,hosthandle=0x16172e0,lineno=13,name=xtab)
25:pgi_uacc_mirror_alloc(size=2048,elemsize=4,lineno=13,name=xtab) returns 0xb00300000
26:pgi_uacc_mirror_dealloc(ptr=0xb00300000,lineno=15,name=xtab)
|
Will this be possible with the futur OpenAcc directives like device_resident
( or later with enter/exit data in openacc 2.0 ? )
A+
Juan |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|