PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

pgi13.3 PB , mirror/host data asynchronously updated ??
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
escj



Joined: 30 Sep 2009
Posts: 63
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Fri Apr 05, 2013 8:09 am    Post subject: pgi13.3 PB , mirror/host data asynchronously updated ?? Reply with quote

Hello .

A strange/random problem with a simple case of management of Module data with mirror declaration and allocation via a module routine .

No problem with pgi/13.1
Problem with pgi/13.3

Test done on a GeForce GTX 470 with cuda5.0

After the update host the result must by 100.0 ( randomly some time the answer could be OK )
Code:
 test_update
 BEFORE UPDATE AA=    0.000000   
 AFTER  UPDATE AA=    0.000000


If run with PGI_ACC_SYNCHRONOUS=1, all is OK again ... but the code don't use any async directive !

Code:
 
 PGI_ACC_SYNCHRONOUS=1 test_update
 BEFORE UPDATE AA=    0.000000   
 AFTER  UPDATE AA=    100.0000
 


The answer is also OK , if the allocate is done directly in the main program ( but not usable in a real application with hundred variable )


Compilation
Code:
 pgf90 -ta=nvidia test_update.f90 -o test_update


Source
Code:
 
MODULE MODD_DATA
IMPLICIT NONE
  INTEGER, PARAMETER :: Nvec=2048
  REAL , ALLOCATABLE, DIMENSION(:) :: AA
  !$acc mirror(AA)
CONTAINS

  SUBROUTINE ALLOC_DATA_MODULE()
    IMPLICIT NONE
    ALLOCATE( AA(Nvec) )
  END SUBROUTINE ALLOC_DATA_MODULE

  SUBROUTINE INIT_DATA(XTAB,XVAL)
    IMPLICIT NONE
    REAL , DIMENSION(:)              :: XTAB   
    !$acc reflected (XTAB)
    REAL                             :: XVAL

    !$acc kernels
    XTAB = XVAL
    !$acc end kernels
 
  END SUBROUTINE INIT_DATA
END MODULE MODD_DATA

PROGRAM TEST_ASYNC
USE MODD_DATA
  IMPLICIT NONE

  CALL ALLOC_DATA_MODULE()
!!$    ALLOCATE( AA(Nvec) )

  CALL INIT_DATA(AA, 10.0 )

  print*, "BEFORE UPDATE AA=" , AA(Nvec) ;  call flush(6)

  !$acc update host(AA)
  AA(Nvec) =   AA(Nvec) * 10.0
  print*, "AFTER  UPDATE AA=" , AA(Nvec) ; call flush(6)

END PROGRAM TEST_ASYNC
 


A+

Juan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Fri Apr 05, 2013 10:54 am    Post subject: Reply with quote

Hi Juan,

I think what's happening is that the update isn't completing before the variable is being used in the calculation. However, I'm only able to reproduce this issue in 13.2. 13.3 seems ok. Granted, I was getting intermittent correct answers with 13.2 until I initialize the host copy, so 13.3 may be a false positive as well, but I'd like you to please double check.

Thanks,
Mat


Code:
% cat test_update.f90
MODULE MODD_DATA
IMPLICIT NONE
  INTEGER, PARAMETER :: Nvec=2048
  REAL , ALLOCATABLE, DIMENSION(:) :: AA
  !$acc mirror(AA)
CONTAINS

  SUBROUTINE ALLOC_DATA_MODULE()
    IMPLICIT NONE
    ALLOCATE( AA(Nvec) )
  END SUBROUTINE ALLOC_DATA_MODULE

  SUBROUTINE INIT_DATA(XTAB,XVAL)
    IMPLICIT NONE
    REAL , DIMENSION(:)              :: XTAB   
    !$acc reflected (XTAB)
    REAL                             :: XVAL

    !$acc kernels
    XTAB = XVAL
    !$acc end kernels
 
  END SUBROUTINE INIT_DATA
END MODULE MODD_DATA

PROGRAM TEST_ASYNC
USE MODD_DATA
  IMPLICIT NONE
  CALL ALLOC_DATA_MODULE()
!!$    ALLOCATE( AA(Nvec) )

  CALL INIT_DATA(AA, 10.0 )
  AA=10.0
  !acc update host(AA)  << Moving the host update here works around the issue
  print*, "BEFORE UPDATE AA=" , AA(Nvec) ;  call flush(6)
 
  !$acc update host(AA)
  !acc wait   << This seems to also work around the issue in 13.2

  AA(Nvec) =   AA(Nvec) * 10.0
  print*, "AFTER  UPDATE AA=" , AA(Nvec) ; call flush(6)

END PROGRAM TEST_ASYNC
 
% pgf90 test_update.f90 -o test_update -V13.2 -ta=nvidia  ; test_update
 BEFORE UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    10.00000   
% pgf90 test_update.f90 -o test_update -V13.3 -ta=nvidia  ; test_update
 BEFORE UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    100.0000   

Back to top
View user's profile
escj



Joined: 30 Sep 2009
Posts: 63
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Mon Apr 08, 2013 6:45 am    Post subject: Reply with quote

Hello Mat .

Double Check ... it doesn't work correctly with pgi13.3 ...

Has you, before submitting the bug, I have tested the wait directive with no success ... Even with 2 or 3 wait after the update it doesn't work ...

I recheck the test, and trying your suggestion ...
... the only way to make it work correctly/reproductively is to put a double udpate host before using the value in the host part ?!
( initialization on the host or print have no/marginal effect )

Code:

  !$acc update host(AA)
  !$acc update host(AA)
  AA(Nvec) =   AA(Nvec) * 10.0
  print*, "AFTER  UPDATE AA=" , AA(Nvec) ; call flush(6)


Compilation with this double update :
Code:

pgf90 --version -ta=nvidia,cc13,cc20,cuda5.0 -Minfo=ccff,all,intensity -Mprof=ccff test_update_mat.f90 -o test_update_mat_133_dble_update 2>&1 | egrep "pgf90|update"
     34, Generating update host(aa(:))
     35, Generating update host(aa(:))
pgf90 13.3-0 64-bit target on x86-64 Linux -tp nehalem


Run 10 times test_update_mat_133_dble_update
Code:

 for i in $( seq 10 ) ; do PGI_ACC_SYNCHRONOUS=0 test_update_mat_133_dble_update; done
 AFTER  UPDATE AA=    100.0000   
 AFTER  UPDATE AA=    100.0000   
 AFTER  UPDATE AA=    100.0000   
 AFTER  UPDATE AA=    100.0000   
 AFTER  UPDATE AA=    100.0000   
 AFTER  UPDATE AA=    100.0000   
 AFTER  UPDATE AA=    100.0000   
 AFTER  UPDATE AA=    100.0000   
 AFTER  UPDATE AA=    100.0000   
 AFTER  UPDATE AA=    100.0000


Compilation with one update
Code:

 pgf90 --version -ta=nvidia,cc13,cc20,cuda5.0 -Minfo=ccff,all,intensity -Mprof=ccff test_update_mat.f90 -o test_update_mat_133_one_update 2>&1 | egrep "pgf90|update"
     35, Generating update host(aa(:))
pgf90 13.3-0 64-bit target on x86-64 Linux -tp nehalem


Run 10 times test_update_mat_133_one_update
Code:

 for i in $( seq 10 ) ; do PGI_ACC_SYNCHRONOUS=0 test_update_mat_133_one_update; done
 AFTER  UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    10.00000   
 AFTER  UPDATE AA=    10.00000 


Another point to show that the pgi13.3 is in fault, the same executable produce a cuMemcpyDtoHAsync error in a previous generation of GPU on a GXT280

Code:

 pgaccelinfo
CUDA Driver Version:           5000
NVRM version: NVIDIA UNIX x86_64 Kernel Module  304.54  Sat Sep 29 00:05:49 PDT 2012

CUDA Device Number:            0
Device Name:                   GeForce GTX 280
Device Revision Number:        1.3
...
 for i in $( seq 10 ) ; do PGI_ACC_SYNCHRONOUS=0 test_update_mat_133_one_update; done
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value
call to cuMemcpyDtoHAsync returned error 1: Invalid value


No problem with the double update on this platform to ....

I post the simplified source code again to be sure we are doing the same test :
Code:

MODULE MODD_DATA
IMPLICIT NONE
  INTEGER, PARAMETER :: Nvec=2048
  REAL , ALLOCATABLE, DIMENSION(:) :: AA
  !$acc mirror(AA)
CONTAINS

  SUBROUTINE ALLOC_DATA_MODULE()
    IMPLICIT NONE
    ALLOCATE( AA(Nvec) )
  END SUBROUTINE ALLOC_DATA_MODULE

  SUBROUTINE INIT_DATA(XTAB,XVAL)
    IMPLICIT NONE
    REAL , DIMENSION(:)              :: XTAB   
    !$acc reflected (XTAB)
    REAL                             :: XVAL

    !$acc kernels
    XTAB = XVAL
    !$acc end kernels
 
  END SUBROUTINE INIT_DATA
END MODULE MODD_DATA

PROGRAM TEST_ASYNC
USE MODD_DATA
  IMPLICIT NONE

  CALL ALLOC_DATA_MODULE()

  CALL INIT_DATA(AA, 10.0 )

  !acc update host(AA)
  !$acc update host(AA)
  AA(Nvec) =   AA(Nvec) * 10.0
  print*, "AFTER  UPDATE AA=" , AA(Nvec) ; call flush(6)

END PROGRAM TEST_ASYNC


A+

Juan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon Apr 08, 2013 10:56 am    Post subject: Reply with quote

Hi Juan,

I'm still not able to consistently reproduce this error leading me to believe that there is a UMR or synchronization issue going on. In 13.2 it fails only part of the time and in 13.3 it only fails for me when I compile to a generic CPU target (-tp=px-64). Though, that should be enough for our engineers to work with to determine the cause. I've added TPR#19269 to track this issue.

Another work around would be to move to using OpenACC data regions and the present clause:

Code:
$ cat test_update_mat2.f90
MODULE MODD_DATA
IMPLICIT NONE
  INTEGER, PARAMETER :: Nvec=2048
  REAL , ALLOCATABLE, DIMENSION(:) :: AA
  !acc mirror(AA)
CONTAINS

  SUBROUTINE ALLOC_DATA_MODULE()
    IMPLICIT NONE
    ALLOCATE( AA(Nvec) )
  END SUBROUTINE ALLOC_DATA_MODULE

  SUBROUTINE INIT_DATA(XTAB,XVAL)
    IMPLICIT NONE
    REAL , DIMENSION(:)              :: XTAB   
    !acc reflected (XTAB)
    REAL                             :: XVAL

    !$acc kernels present(XTAB)
    XTAB = XVAL
    !$acc end kernels
 
  END SUBROUTINE INIT_DATA
END MODULE MODD_DATA

PROGRAM TEST_ASYNC
  USE MODD_DATA
  IMPLICIT NONE

  CALL ALLOC_DATA_MODULE()

!$acc data create(AA)

  CALL INIT_DATA(AA, 10.0 )

  !acc update host(AA)
  !$acc update host(AA)
  AA(Nvec) =   AA(Nvec) * 10.0
  print*, "AFTER  UPDATE AA=" , AA(Nvec) ; call flush(6)

!$acc end data

END PROGRAM TEST_ASYNC


Best Regards,
Mat
Back to top
View user's profile
escj



Joined: 30 Sep 2009
Posts: 63
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Wed Apr 10, 2013 4:15 am    Post subject: Reply with quote

Hello Mat , thank you for the bug report ...

;-) The create(AA, ... ) directive is not really applicable in our code, it will become a directive spanning over 100 pages !

It's definitively a 'Warm-up/Foo' problem ...
With my previous version with mirror data ( ;-) waiting for device_resident ... )
just adding an unused foo mirrored array a the beging of the main program solve the issue .

Code:

PROGRAM TEST_ASYNC
USE MODD_DATA
  IMPLICIT NONE

  INTEGER, ALLOCATABLE, DIMENSION(:) :: ifoo
  !$acc mirror(ifoo)
  ALLOCATE (ifoo(1))

...


;-) So now Next Level ...

In my Original unit test the goal was to have a generic ALLOC_DATA_MODULE(), with an argument (XTAB) callable with data mirrored in data module=AA , something like this :
Code:

  SUBROUTINE ALLOC_DATA_MODULE(XTAB)
    IMPLICIT NONE
    REAL , ALLOCATABLE, DIMENSION(:) :: XTAB
    !$acc declare mirror (XTAB) 

    ALLOCATE( XTAB(Nvec) )

  END SUBROUTINE ALLOC_DATA_MODULE


This doesn't work , because the device part is deallocated at the exit of the alloc_data routine
Code:

 PGI_ACC_DEBUG=3 test_update_generic 2>&1 | egrep -n 'mirror.*xtab'
19:pgi_uacc_mirror_alloc(size=2048,elemsize=4,hosthandle=0x16172e0,lineno=13,name=xtab)
25:pgi_uacc_mirror_alloc(size=2048,elemsize=4,lineno=13,name=xtab) returns 0xb00300000
26:pgi_uacc_mirror_dealloc(ptr=0xb00300000,lineno=15,name=xtab)


Will this be possible with the futur OpenAcc directives like device_resident
( or later with enter/exit data in openacc 2.0 ? )

A+
Juan
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group