PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Fatal Usage Error with simple 'mirror' & ACC_DEVICE=HOST
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
escj



Joined: 30 Sep 2009
Posts: 63
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Sat Sep 22, 2012 4:49 pm    Post subject: Fatal Usage Error with simple 'mirror' & ACC_DEVICE=HOST Reply with quote

Hello , I'm using the last pgi/12.8 ( all version have the same problem ) .

A sample 'mirror' example compiled in unified mode
Code:

pgf90 -g -ta=nvidia,host  acc_mirror.f90  -o acc_mirror_unified


give a Fatal Error when launched on the host with ACC_DEVICE=HOST

Quote:

ACC_DEVICE=HOST acc_mirror_unified
Fatal Usage Error: __pgi_acc_mirrorall2 called before __pgi_cu_init


No problem on the device ( GTX470 )
Quote:

ACC_DEVICE=NVIDIA acc_mirror_unified
X= 3.141500



Here is the sample code ( extracted from the real code )
Code:

MODULE my_data
  IMPLICIT NONE
  real , allocatable, dimension(:) :: XA
  !$acc mirror(XA)
END MODULE my_data

PROGRAM test_mirror_host

  USE my_data

  IMPLICIT NONE

  INTEGER, PARAMETER :: NX=64

  allocate (XA(NX))

  !$acc region
  XA = 3.1415
  !$acc end region

  !$acc update host(XA(NX:NX))
  print *,"X=", XA(NX)

END PROGRAM test_mirror_host


A+
Juan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6215
Location: The Portland Group Inc.

PostPosted: Mon Sep 24, 2012 9:56 am    Post subject: Reply with quote

Hi Juan,

Hmm, it doesn't look like this has ever worked. I submitted a problem report (TPR#18938) and will have engineering see what they can do. The work around would be to use a data region rather then mirror:

Code:
% cat acc_mirror1.f90
MODULE my_data
  IMPLICIT NONE
  real , allocatable, dimension(:) :: XA
  !acc mirror(XA)
END MODULE my_data

PROGRAM test_mirror_host

  USE my_data

  IMPLICIT NONE

  INTEGER, PARAMETER :: NX=64

  allocate (XA(NX))
!$acc data region copyout(XA)
  !$acc region
  XA = 3.1415
  !$acc end region
!$acc end data region

  print *,"X=", XA(NX)

END PROGRAM test_mirror_host
% pgf90 -ta=host,nvidia -Minfo acc_mirror1.f90 -o acc_mirror_unified -V12.9 ; acc_mirror_unified
test_mirror_host:
      7, PGI Unified Binary version for -tp=nehalem-64 -ta=host
     18, Memory set idiom, loop replaced by call to __c_mset4
test_mirror_host:
      7, PGI Unified Binary version for -tp=nehalem-64 -ta=nvidia
     16, Generating copyout(xa(:))
     17, Generating present_or_copyout(xa(:))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     18, Loop is parallelizable
         Accelerator kernel generated
         18, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
             CC 1.0 : 7 registers; 44 shared, 0 constant, 0 local memory bytes
             CC 2.0 : 11 registers; 0 shared, 60 constant, 0 local memory bytes
 X=    3.141500   


Thanks again,
Mat
Back to top
View user's profile
escj



Joined: 30 Sep 2009
Posts: 63
Location: Laboratoire d'Aérologie, Toulouse, FRANCE

PostPosted: Tue Sep 25, 2012 7:25 am    Post subject: Reply with quote

Hello Mat .

OK for the data region ( it was my previous version ) , but I'm in an optimization phase at that point ...

The XA work buffer is used many time in the code, typically for halo exchange with MPI ...

So I try to allocate it one for all the duration of the code ...

I have already done it for the CPU part ...
... but for the GPU part if I put it in a "data region" I think that the GPU memory is allocated and free every time the code enter & exit the data region , so a lot of time lost ...

... it right no ?
... and the mirror clause is just what I need ...

A+

Juan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6215
Location: The Portland Group Inc.

PostPosted: Tue Sep 25, 2012 8:36 am    Post subject: Reply with quote

Quote:
... but for the GPU part if I put it in a "data region" I think that the GPU memory is allocated and free every time the code enter & exit the data region , so a lot of time lost ...
... it right no ?
... and the mirror clause is just what I need ...

Correct. You can still use mirror, but just not within a unified binary.

- Mat
Back to top
View user's profile
jtull



Joined: 30 Jun 2004
Posts: 445

PostPosted: Fri May 17, 2013 5:11 pm    Post subject: Reply with quote

Juan,

TPR 18938 was fixed in the 13.2 release.

Thanks for the submission.

dave
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group