PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Keeping data on GPU while looping and calling subroutines
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Mon Mar 05, 2012 10:16 am    Post subject: Reply with quote

Quote:
Can the interface be between two subprograms?
Yes.
Back to top
View user's profile
sslgamess



Joined: 23 Nov 2009
Posts: 35

PostPosted: Tue Mar 06, 2012 6:44 pm    Post subject: Reply with quote

Hi Mat,

Sorry for coming back to this post.

The interface and module stuff is new to me.

Could you show me how the mirrored example would look like without the use of a module (using an explicit interface with the subroutine).

And lets say that the call tree goes through an intermediate subroutine before the subroutine containing the gpu kernel:

program main > subroutine intermediate > subroutine accumulateTrigo

would i need to have an interface between main and intermediate and an interface between intermediate and accumulateTrigo?

Thanks,
Sarom
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Wed Mar 07, 2012 9:57 am    Post subject: Reply with quote

Here you go. Using the initial reflected example as the base, I modified X to be allocatable and then added the mirror directive. mirror creates an implicit data region with the same scope as the variable, hence I removed the original explicit data region. I also put X's initialisation loop into an compute region so I didn't need to copy the data (otherwise you need to use an update directive to get the data over to the GPU).

Quote:
would i need to have an interface between main and intermediate and an interface between intermediate and accumulateTrigo?
Yes and I updated the example (reflect3.f90) to reflect this. However, it would be uncommon to do this. More likely you would move these routines into a module where an implicit interface is created or create a module that contains nothing but an interface (reflect4.f90). The C equivalent would be a header file with prototype functions.

Much of your challenge with GAMESS will be porting it to F90. But, at least in my opinion, F90 is a much better language than F77 and well worth the effort.

- Mat

Code:
% cat reflect3.f90

subroutine accumulateTrigo(a, size, sum)
    integer :: ii,jj, size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
    do jj=1,500
    sum=0.0
!$acc region
      do ii=1,size
            sum = sum + sin(a(ii)) ** 2 + cos(a(ii)) ** 2
      enddo
!$acc end region     
    enddo
    return
end subroutine


subroutine intermediate(a, size, sum)
    integer ::  size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)

interface
  subroutine accumulateTrigo(a, size, sum)
    integer :: size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
  end subroutine accumulateTrigo
end interface

    print *, 'INTER size=', size
    call accumulateTrigo(a, size, sum)
    print *, 'INTER sum=', sum

end subroutine intermediate
                       
program main
    real, allocatable, dimension(:) ::  X
    integer :: Xsize,m,i,k,c1,c2   
    real :: lastSum
!$acc mirror (X)
 
interface
  subroutine intermediate(a, size, sum)
    integer :: size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
  end subroutine intermediate
end interface
 
    Xsize = 100000
    allocate(X(Xsize)) 
    m = 5           ! m calls to subroutine accumulateTrigo
 
! GPU initialization
#ifdef _ACCEL
    call acc_init( acc_device_nvidia )
#endif   

! initialization of array X
!$acc region do
    do i = 1,Xsize
        X(i) = (i*2.0)
    enddo

! computations on GPU   
    call system_clock( count=c1 )
    do k= 1, m     
        call intermediate(X, Xsize, lastSum)
    enddo
   
    print *, "LAST = ", lastSum
    call system_clock( count=c2 )
    print *, (c2-c1)/1000.0, ' milliseconds'
end program
Code:

% cat reflect4.f90

module myinter

interface
  subroutine accumulateTrigo(a, size, sum)
    integer :: size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
  end subroutine accumulateTrigo

  subroutine intermediate(a, size, sum)
    integer :: size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
  end subroutine intermediate

end interface

end module myinter


subroutine accumulateTrigo(a, size, sum)
    integer :: ii,jj, size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
    do jj=1,500
    sum=0.0
!$acc region
      do ii=1,size
            sum = sum + sin(a(ii)) ** 2 + cos(a(ii)) ** 2
      enddo
!$acc end region     
    enddo
    return
end subroutine


subroutine intermediate(a, size, sum)
    use myinter
    integer ::  size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)

    print *, 'INTER size=', size
    call accumulateTrigo(a, size, sum)
    print *, 'INTER sum=', sum

end subroutine intermediate
                       
program main
    use myinter
    real, allocatable, dimension(:) ::  X
    integer :: Xsize,m,i,k,c1,c2   
    real :: lastSum
!$acc mirror (X)
 
    Xsize = 100000
    allocate(X(Xsize)) 
    m = 5           ! m calls to subroutine accumulateTrigo
 
! GPU initialization
#ifdef _ACCEL
    call acc_init( acc_device_nvidia )
#endif   

! initialization of array X
!$acc region do
    do i = 1,Xsize
        X(i) = (i*2.0)
    enddo

! computations on GPU   
    call system_clock( count=c1 )
    do k= 1, m     
        call intermediate(X, Xsize, lastSum)
    enddo
   
    print *, "LAST = ", lastSum
    call system_clock( count=c2 )
    print *, (c2-c1)/1000.0, ' milliseconds'
end program

Back to top
View user's profile
sslgamess



Joined: 23 Nov 2009
Posts: 35

PostPosted: Thu Mar 08, 2012 2:46 am    Post subject: Reply with quote

Thanks Mat.

You are right.

Utilizing mirrored and reflected requires replacing those X() arrays with allocatable arrays.

We also lack a Fortran 90 expert in the group. So getting a handle of the 'interface' and 'module' constructs will be interesting.

Which brings up my next question.

How do I properly treat allocatable arrays with an interface?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Thu Mar 08, 2012 11:06 am    Post subject: Reply with quote

Quote:
Utilizing mirrored and reflected requires replacing those X() arrays with allocatable arrays.
Mirror is for use with allocatable arrays since it "mirrors" the allocation status of the array on both the host and device. However, reflected can be used with static arrays (See the first reflected.f90 example).

Quote:
How do I properly treat allocatable arrays with an interface?

Use assumed-shape array syntax. The size of "a" will be determined at runtime, while it's shape (type, rank, kind) is known at compile time.
Code:
interface
  subroutine accumulateTrigo(a, size, sum)
    integer :: size
    real, dimension(:) :: a
    real :: sum
!$acc reflected (a)
  end subroutine accumulateTrigo
end interface
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group