PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Keeping data on GPU while looping and calling subroutines
Goto page Previous  1, 2, 3
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
sslgamess



Joined: 23 Nov 2009
Posts: 35

PostPosted: Fri Mar 09, 2012 4:40 pm    Post subject: Reply with quote

Hi Mat,

I was able to get reflected to work in GAMESS.

However, I would like to make use of mirror.

If you don't mind, could you rewrite your mirror.f90 example without using modules.

I don't want to start using the module construct yet until I have a better understanding of it.

I am comfortable with interfaces.

Many thanks again,
Sarom
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Fri Mar 09, 2012 5:22 pm    Post subject: Reply with quote

Hi Sarom,

Quote:
I was able to get reflected to work in GAMESS.
Excellent.

Quote:
If you don't mind, could you rewrite your mirror.f90 example without using modules.
Already done. That 's the 'reflect3.f90' example.
Quote:

I don't want to start using the module construct yet until I have a better understanding of it.
I am comfortable with interfaces.
You'll find using modules is a lot easier (a least a lot less typing) then writing our explicit interfaces for every subroutine. Though, I certainly understand.

Best Regards,
Mat
Back to top
View user's profile
sslgamess



Joined: 23 Nov 2009
Posts: 35

PostPosted: Sun Mar 11, 2012 12:58 am    Post subject: Timing the copyin of the reflected arrays. Reply with quote

Hi Mat,

I am trying to implement the mirror directive in GAMESS but I get a warning of:

Invalid accelerator data region: not a single-entry single-exit region


I suspected this to be due to the presence of an ENTRY statement in the subroutine. So, I modified the sample code from this thread to include an ENTRY statement and was able to reproduce the warning message.

Is there a work around for mirror directives and the presence of the ENTRY statement?

Code:
module myinter

interface
  subroutine accumulateTrigo(a, size, sum)
    integer :: size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
  end subroutine accumulateTrigo

  subroutine intermediate(a, size, sum)
    integer :: size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
  end subroutine intermediate
end interface
end module myinter


subroutine accumulateTrigo(a, size, sum)
    integer :: ii,jj, size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
    do jj=1,500
    sum=0.0
!$acc region
      do ii=1,size
            sum = sum + sin(a(ii)) ** 2 + cos(a(ii)) ** 2
      enddo
!$acc end region     
    enddo
    return
end subroutine


subroutine intermediate(a, size, sum)
    use myinter
    integer ::  size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)

    print *, 'INTER size=', size
    call accumulateTrigo(a, size, sum)
    print *, 'INTER sum=', sum

end subroutine intermediate
                       
subroutine mirror
    use myinter
    real, allocatable, dimension(:) ::  X
    integer :: Xsize,m,i,k,c1,c2   
    real :: lastSum
!$acc mirror (X)
   
    Xsize = 100000

    entry mirror_jump(dummy_arg)

    allocate(X(Xsize))
    m = 5           ! m calls to subroutine accumulateTrigo
 
! GPU initialization
#ifdef _ACCEL
    call acc_init( acc_device_nvidia )
#endif   

! initialization of array X
    do i = 1,Xsize
        X(i) = (i*2.0)
    enddo
!$acc update device(X)

! computations on GPU   
    call system_clock( count=c1 )
    do k= 1, m     
        call intermediate(X, Xsize, lastSum)
    enddo

    print *, "LAST = ", lastSum
    call system_clock( count=c2 )
    print *, (c2-c1)/1000.0, ' milliseconds'
end subroutine

program main
    call mirror
end program


Here is a reflected version that works with an ENTRY statement.

Code:
module myinter

interface
  subroutine accumulateTrigo(a, size, sum)
    integer :: size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
  end subroutine accumulateTrigo

  subroutine intermediate(a, size, sum)
    integer :: size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
  end subroutine intermediate
end interface
end module myinter


subroutine accumulateTrigo(a, size, sum)
    integer :: ii,jj, size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)
    do jj=1,500
    sum=0.0
!$acc region
      do ii=1,size
            sum = sum + sin(a(ii)) ** 2 + cos(a(ii)) ** 2
      enddo
!$acc end region     
    enddo
    return
end subroutine


subroutine intermediate(a, size, sum)
    use myinter
    integer ::  size
    real, dimension(size) :: a
    real :: sum
!$acc reflected (a)

    print *, 'INTER size=', size
    call accumulateTrigo(a, size, sum)
    print *, 'INTER sum=', sum

end subroutine intermediate
                       
subroutine reflected(Xsize)
    use myinter
    integer :: Xsize,m,i,k,c1,c2
    real :: lastSum
    real, dimension(:) ::  X(Xsize)
   
    entry mirror_jump(dummy_args)
   
    m = 5           ! m calls to subroutine accumulateTrigo
 
! GPU initialization
#ifdef _ACCEL
    call acc_init( acc_device_nvidia )
#endif   

! initialization of array X
    do i = 1,Xsize
        X(i) = (i*2.0)
    enddo
!$acc data region copyin(X)

! computations on GPU   
    call system_clock( count=c1 )
    do k= 1, m     
        call intermediate(X, Xsize, lastSum)
    enddo
!$acc end data region
   
    print *, "LAST = ", lastSum
    call system_clock( count=c2 )
    print *, (c2-c1)/1000.0, ' milliseconds'
end subroutine

program main
    call reflected(100000)
end program
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Mar 12, 2012 10:55 am    Post subject: Reply with quote

Hi Sarom,

Quote:
Is there a work around for mirror directives and the presence of the ENTRY statement?
Data and compute regions must have a single entry and exit point. So the work around in the mirror case is to move the data region so it's declared after the entry statement.

The directive "!$acc mirror(X)" creates an implicit data region having the same scope as the X array. However, "mirror" can also be a clause in an explicit data region.
Code:

    entry mirror_jump(dummy_arg)

! GPU initialization
#ifdef _ACCEL
    call acc_init( acc_device_nvidia )
#endif   

!$acc data region mirror (X)
    allocate(X(Xsize))
    m = 5           ! m calls to subroutine accumulateTrigo

Though, there's no real advantage of using a mirror clause here. The same thing could be expressed with the local clause depending on when the X array is allocated.
Code:

    entry mirror_jump(dummy_arg)

    allocate(X(Xsize))

! GPU initialization
#ifdef _ACCEL
    call acc_init( acc_device_nvidia )
#endif   

!$acc data region local (X)
    m = 5           ! m calls to subroutine accumulateTrigo


- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group