PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Example for "device present" Directive

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
sindimo



Joined: 30 Nov 2010
Posts: 29
Location: Saudi Aramco

PostPosted: Wed Dec 29, 2010 12:35 am    Post subject: Example for "device present" Directive Reply with quote

Can you please provide an example on how to use the new PGI 11 "device present" directive. I am a bit confused about the difference between the "device present" and "reflected" directives.

Thank you
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Jan 03, 2011 2:27 pm    Post subject: Reply with quote

Hi sindimo,

The main difference is when the association between the host and device occurs. With 'reflected', the association occurs at compile time. With "present", the association occurs at run time. This allows 'present' to associate global variables as well as arguments. Also, if you are passing device data down multiple calls 'present' removes the need to add reflected to each subroutine.

Note that "present" is new in the 1.3 version of the PGI Accelerator Model design spec and will be available later this year.

Hope this helps,
Mat
Back to top
View user's profile
wsawyer



Joined: 19 Jan 2011
Posts: 7

PostPosted: Wed Jan 19, 2011 4:58 am    Post subject: Examples for reflected and device present directives Reply with quote

Also we would be interested in examples of the reflected and device present directives. Could anyone provide a pointer?

Thanks, --Will
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Wed Jan 19, 2011 11:58 am    Post subject: Reply with quote

Hi Willi,

Here's an example of using 'reflective' and 'mirrored'. device present isn't implemented yet, so I don't have an example.

Hope this helps,
Mat

Code:
$ cat test.f90

module mm
 implicit none
 integer, parameter :: n=40,m=50
 integer :: oo = 2
 real, dimension(:,:), allocatable :: a
 !$acc mirror(a)

contains
 subroutine sub1( b, c, w )
  implicit none
  real :: b(:,:), c(:,:), w(2)
  !$acc reflected(b)
  integer :: i,j
  !$acc region
   do j = oo+1,ubound(a,2)-oo
    do i = oo+1,ubound(a,1)-oo
     a(i,j) = b(i,j)*w(1) + c(i,j)*w(2)
    enddo
   enddo
  !$acc end region
 end subroutine

 subroutine sub2( b, c, w )
  implicit none
  real :: b(:,:), c(:,:), w(3)
  integer :: n, m
  integer :: i

  !$acc data region copyin(b)
  do i = 1,2
   call sub1(b,c,w )
  enddo
  !$acc end data region

 end subroutine
end module

program p
 use mm
 use accel_lib
 implicit none
 real :: b(n,m), c(n,m), w(2), aa(n,m)
 integer :: i,j
 allocate(a(n,m))
 do j = 1,m
  do i = 1,n
   aa(i,j) = -1.0
   a(i,j) = -1.0
   b(i,j) = (j*100) + i
   c(i,j) = -(j*100) + i
  enddo
 enddo

  w(1) = 1.5
  w(2) = 0.5
  call sub2(b,c,w)
  !$acc update host(a(oo+1:n-oo,oo+1:m-oo))

  print *, a(5,5), a(n-2,n-2)
 
end program

$ pgf90 test.f90 -Minfo; a.out
    510.0000        3876.000   
$ pgf90 mm40.f90 -Minfo -ta=nvidia; a.out
sub1:
     18, Generating local(b(:,:))
     20, Generating copyin(c(:,:))
         Generating copyin(w(1:2))
         Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     21, Loop is parallelizable
     22, Loop is parallelizable
         Accelerator kernel generated
         21, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
         22, !$acc do parallel, vector(16) ! blockidx%x threadidx%x
             Cached references to size [2] block of 'w'
             CC 1.0 : 14 registers; 120 shared, 16 constant, 0 local memory bytes; 66% occupancy
             CC 1.3 : 14 registers; 120 shared, 16 constant, 0 local memory bytes; 100% occupancy
             CC 2.0 : 20 registers; 16 shared, 116 constant, 0 local memory bytes; 100% occupancy
sub2:
     35, Generating copyin(b(:,:))
p:
     63, Generating !$acc update host(a(oo+1:-oo+40,oo+1:-oo+50))
    510.0000        3876.000 
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group