PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

PGI Accelerator programming concepts questions
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Ankhazam



Joined: 24 Aug 2010
Posts: 7

PostPosted: Mon Sep 27, 2010 2:05 am    Post subject: PGI Accelerator programming concepts questions Reply with quote

Hello,
the tutorials provided are very handy but I would have some questions that could confirm that I understand this programming model properly:
1) When I explicitly create a data region with copyins/outs/locals, then these clauses, no matter how many acc regions (computing) I create within this data region, are handled only at the beginning and at the end of the data region, not at each computing region?
2) Can I nest data regions within themselves?
3) IF/SWITCH clauses have to be avoided only within computing ACC regions, I can have them normally in data regions?
4) Can I put a subroutine call within a data region? (that subroutines has then computing regions within)?
5) Could you please provide a short code tutoring me on the update clause?

And some questions concerning the PGI Accelerator environment:
1) I'm deploying the accelerated software in Fortran90 on a Tesla rack server (4xGT200) and enabling multi-GPU via MPI.
a) the ACC_NOTIFY shows me only kernel launch info from the process with rank 0 even though 4 separate GPUs are utilised, (1MPI@1core+1GPU), can I see all 4 information
b) when can I expect support for PGI accelerator within OpenMP regions?
2) How should I use the pgi_accinit tool? Running it background is enough?
3) When I'm compiling the software with static common blocks greater than 2GB I get some compiler errors (even withouth Acceleration), introducing --mcmodel=medium helps for the errors within my software, but still those errors occur on some hpf libraries from PGI compilers directory... (Linux x86_64 Fedora 11)

Thank You in advance for your replies.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Mon Sep 27, 2010 1:09 pm    Post subject: Reply with quote

Quote:
1) When I explicitly create a data region with copyins/outs/locals, then these clauses, no matter how many acc regions (computing) I create within this data region, are handled only at the beginning and at the end of the data region, not at each computing region?

Correct. The exception being when you use the update directive.

Quote:
2) Can I nest data regions within themselves?

Yes.

Quote:
3) IF/SWITCH clauses have to be avoided only within computing ACC regions, I can have them normally in data regions?

No, the if clause only applies to compute regions.

Quote:
4) Can I put a subroutine call within a data region?

In host code, yes. In an acc compute region, no.

Quote:
(that subroutines has then computing regions within)?

Soon. The PGI 2011 (aka 11.0) release in November will allow data regions to span across subroutine calls using the 'reflected' directive. Note that is will be a Fortran only feature.

Quote:
5) Could you please provide a short code tutoring me on the update clause?


Code:
% cat update.f90


program foo

   real, dimension(1024) :: A, B
   integer i
   A = 1.0

!$acc data region copy(A), local(B)
!$acc region
   do i=1,1024
     B(i) = A(i) / 2
     A(i) = A(i) * i
   end do
!$acc end region

! update the host copy of A and print the intermediary values
!$acc update host(A)
   print *, A(1), A(1024)

!$acc region
   do i=1,1024
     A(i) = B(i) * 2
   end do
!$acc end region

!$acc end data region

   print *, A(1), A(1024)

   end program foo

% pgf90 update.f90 -V10.9 -ta=nvidia -Minfo=accel
foo:
      9, Generating local(b(:))
         Generating copy(a(:))
     10, Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
     11, Loop is parallelizable
         Accelerator kernel generated
         11, !$acc do parallel, vector(256)
             Using register for 'a'
             CC 1.0 : 6 registers; 20 shared, 24 constant, 0 local memory bytes; 100 occupancy
             CC 1.3 : 6 registers; 20 shared, 24 constant, 0 local memory bytes; 100 occupancy
     17, Generating !$acc update host(a(:))
     20, Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
     21, Loop is parallelizable
         Accelerator kernel generated
         21, !$acc do parallel, vector(256)
             CC 1.0 : 3 registers; 20 shared, 24 constant, 0 local memory bytes; 100 occupancy
             CC 1.3 : 3 registers; 20 shared, 24 constant, 0 local memory bytes; 100 occupancy
% a.out
    1.000000        1024.000
    1.000000        1.000000


Hope this helps,
Mat
Back to top
View user's profile
Ankhazam



Joined: 24 Aug 2010
Posts: 7

PostPosted: Tue Sep 28, 2010 2:19 am    Post subject: Reply with quote

Hello,
thank You very much for such fast and comprehensive reply.

Quote:
Quote:
3) IF/SWITCH clauses have to be avoided only within computing ACC regions, I can have them normally in data regions?

No, the if clause only applies to compute regions.

My mistake, I meant classical IF/SWITCH fortran statements, not the PGI Acc clauses.

Could You also answer to the environmental questions?
I also found a thread about Accelerator regions with OpenMP regions, is this feature also already available or when can we expect it?
What are the new features we can expect in the upcoming releases?
Besides OpenMP is there a possibility for the Accelerator Programming Model to act likce OpenCL -> empower heterogenous architectures (I am aware of the Unified Binary Technology) but it won't automatically deploy on a computing cluster node to balance all the computations on CPUs+GPUs where number of CPU cores >> GPUs.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Tue Sep 28, 2010 8:42 am    Post subject: Reply with quote

Quote:
My mistake, I meant classical IF/SWITCH fortran statements, not the PGI Acc clauses.
No problem. You can have any Fortran statements in the host code that is within the data region.
Quote:

I also found a thread about Accelerator regions with OpenMP regions, is this feature also already available or when can we expect it?
You can use accelerator regions within an OpenMP parallel region. The only caveat is that you can't use both on the same for loop. The basic outline is:

Code:

!$omp parallel

threadid = omp_get_thread_num()

! set your device
call acc_set_device_num(threadid, acc_device_nvidia)

! start your accelerator region or call a routine that contains the acc directives.
!$acc region
... etc.


On a side note. You can also use acc directives within MPI code or even hybrid MPI/OpenMP code.

Quote:
What are the new features we can expect in the upcoming releases?
Support for the reflected and mirror clauses will be available in the 11.0 release.

Quote:
Besides OpenMP is there a possibility for the Accelerator Programming Model to act likce OpenCL -> empower heterogenous architectures (I am aware of the Unified Binary Technology) but it won't automatically deploy on a computing cluster node to balance all the computations on CPUs+GPUs where number of CPU cores >> GPUs.
I assume that you mean you would like that the accelerator thread be split across both the GPU and CPU, not just the either/or support found with Unified Binary?

The short answer is no.

I don't know OpenCL myself but don't see how this could be done effectively in an automatic and general way. The lack of a unified memory is major problem and load balancing would be algorithm dependent. Now you could certain do this yourself, for example one OpenMP thread runs on the GPU and another on the CPU, but the compiler simply doesn't have enough information to make good choices to do this automatically.

Hope this helps,
Mat
Back to top
View user's profile
Ankhazam



Joined: 24 Aug 2010
Posts: 7

PostPosted: Wed Sep 29, 2010 1:13 am    Post subject: Reply with quote

Thanks, now I understand the full idea of OpenMP+MPI+PGIAcc, though one more tiny question concerning what's below:

mkcolg wrote:

Quote:
4) Can I put a subroutine call within a data region?

In host code, yes. In an acc compute region, no.

Quote:
(that subroutines has then computing regions within)?

Soon. The PGI 2011 (aka 11.0) release in November will allow data regions to span across subroutine calls using the 'reflected' directive. Note that is will be a Fortran only feature.


Would code like this work:
Code:

!acc data region
!acc. copyin(a,b,c,d)
!acc local(e,f)

...

!acc region
do
...
end do

!acc end region
...

call subfunct1(a,c,e)

...


!acc region
do
...
end do

!acc end region
...


call subfunct2(a,b,d,e,f)

!acc end data region


Subfunct1 and subfunct2 would also have !acc regions but without an explicit data region and I want to avoid extra H2D and D2H data transfers.
Would the compiler automatically remap these variables within these subrroutines and inline them or do I have to pull the raw code from these subfunctions and put it here instead of the calls?

Thanks again in advance for your support,
Nicolas
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group