PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Multiple GPUs with mirror and update clauses

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
KarlW



Joined: 12 Jan 2009
Posts: 23

PostPosted: Fri Oct 07, 2011 3:27 am    Post subject: Multiple GPUs with mirror and update clauses Reply with quote

Hi,

I am having issues when attempting to use the mirror and update device clauses with multiple GPUs. It seems as though only the first GPU is aware of the data that was reflected in a previous routine. This is also true of the initialisation of the second GPU.

More details:

I was recently getting the error: "Fatal Usage Error: __pgi_cu_mirrordealloc called before __pgi_cu_init" at execution time. When I remove the deallocation (which isn't strictly needed) then I got the similar error: "Fatal Usage Error: __pgi_cu_mirroralloc called before __pgi_cu_init". This is associated with the allocation of an array that is updated using the !$acc update(passed2) clause after being defined as mirrored in a separate module and the problem only occurs now that I am trying to run the code across two OpenMP threads.

Further tweaking showed that despite !$acc_init getting called in an OpenMP region within an earlier subroutine this doesn't seem to have been passed on to this routine. Adding an !$acc_init to this routine has removed the error described above but replaced
it with the following error at compile time:

PGF90-S-0155-UPDATE clause requires a visible device copy for symbol passed2 (intega.f: 27998)

This error actually seems to be related to the specification of the passed2 array as being private for the OpenMP region.

Thanks for taking a look,

Karl
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Fri Oct 07, 2011 4:30 pm    Post subject: Reply with quote

Hi Karl,

I've used mirrored in an OpenMP program but the variable needs to private and only allocated after the program has entered a parallel region. A mirrored shared variable isn't yet supported.

Though, I have never seen the specific errors you're getting. Can you write a small reproducing example?

Thanks,
Mat
Back to top
View user's profile
KarlW



Joined: 12 Jan 2009
Posts: 23

PostPosted: Fri Oct 28, 2011 7:31 am    Post subject: Reply with quote

Hi Mat,

I haven't been able to replicate the issue within a smaller piece of sample code I'm afraid.

I recently tried to bypass the issue by moving the code into a separate routine that is called from within the OpenMP region.

However, this results in some behaviour I would consider quite strange: My understanding is that the variables within a subroutine that is called from an OpenMP region are intrinsically private (unless specified otherwise). Unfortunately this does not seem to be the case as I am getting errors that can be corrected by specifying the relevant variables as private.

Am I missing something simple here?

Cheers,

Karl
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Tue Nov 01, 2011 11:37 am    Post subject: Reply with quote

Hi Karl,

Quote:
My understanding is that the variables within a subroutine that is called from an OpenMP region are intrinsically private
Correct. Variables declared locally within a subroutine are implicitly private if the subroutine is called within an OpenMP parallel region. Hence, I suspect something else is going on so would need more details.

Below is a small example program. Can you modify it so that it replicates the behavior you are seeing?

Code:
% cat mirror.f90

program test
  use omp_lib
  implicit none
  integer i,thd,nthd
 
!$omp parallel do
  do i=1,32
     call testme(i)
  enddo 

end program test

subroutine testme (i)
  use omp_lib
#ifdef _ACCEL
  use accel_lib
#endif
  implicit none
  integer :: i, ii
  integer :: thd
  real, dimension(:), allocatable :: arr
!$acc mirror(arr)
  thd = omp_get_thread_num()
#ifdef _ACCEL
  call acc_set_device_num(thd, ACC_DEVICE_NVIDIA)
#endif
  allocate(arr(32))
  arr=0
!$acc region
  do ii=1,32
    arr(ii) = real(i) / (thd+ii)
  end do
!$acc end region
!$acc update host (arr)
  print *, thd, i, sum(arr)
end subroutine testme

% pgf90 -mp -Mpreprocess -Minfo=mp,accel mirror.f90 -ta=nvidia
test:
      7, Parallel region activated
      8, Parallel loop activated with static block schedule
     10, Parallel region terminated
testme:
     23, Generating local(arr(:))
     30, Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     31, Loop is parallelizable
         Accelerator kernel generated
         31, !$acc do parallel, vector(32) ! blockidx%x threadidx%x
             CC 1.0 : 11 registers; 48 shared, 40 constant, 0 local memory bytes; 33% occupancy
             CC 2.0 : 11 registers; 8 shared, 68 constant, 0 local memory bytes; 16% occupancy
     35, Generating !$acc update host(arr(:))
% setenv OMP_NUM_THREADS 4
% a.out
            3           25    57.83620   
            0            1    4.058496   
            2           17    44.50957   
            1            9    27.79918   
            3           26    60.14965   
            0            2    8.116991   
            2           18    47.12778   
            1           10    30.88798   
            3           27    62.46309   
            0            3    12.17549   
            2           19    49.74599   
            1           11    33.97678   
            3           28    64.77655   
            0            4    16.23398   
            2           20    52.36419   
            1           12    37.06558   
            3           29    67.09000   
            0            5    20.29248   
            2           21    54.98241   
            1           13    40.15438   
            3           30    69.40344   
            0            6    24.35097   
            2           22    57.60062   
            1           14    43.24318   
            3           31    71.71688   
            0            7    28.40947   
            2           23    60.21883   
            1           15    46.33197   
            3           32    74.03034   
            0            8    32.46796   
            2           24    62.83704   
            1           16    49.42077   


- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group