PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

OpenAcc not allocating memory on GPU

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Patnaik



Joined: 01 Aug 2012
Posts: 2

PostPosted: Wed Aug 01, 2012 7:34 am    Post subject: OpenAcc not allocating memory on GPU Reply with quote

Hi,

I am trying to allocate memory on the GPU that persists between subroutine calls. My understanding is that
Code:
c$acc declare device_resident(a, b)
when placed in a module will ensure that a and b, when allocated, exist on the GPU for the duration of the program. The program below compiles fine, but fails at run-time with the following error:
FATAL ERROR: data in PRESENT clause was not found: name=b
file:/lcpscratch/patnaik/openACC/tests/test2.f init line:21
My best guess is that the allocation is not happening on the GPU but on the CPU. I do not want to have a data directive with all GPU variables in the main, I want to isolate them into modules. Please help.

Regards, Gopal
Code:


c
compile with: pgfortran -acc -Minfo=accel test2.f
c

      module acc_data

      integer, parameter :: NX = 100000, NY = 1000
c$acc declare device_resident(a, b)
      real, allocatable, save, dimension(:,:) :: a, b
      real, allocatable, save, dimension(:,:) :: c
     
      contains

      subroutine init

      integer :: i, j

      allocate (a(NX,NY),b(NX,NY))
      allocate (c(NX,NY))

c$acc kernels loop present (a,b)
      do j = 1, NY
         do i = 1, NX
            a(i,j) = 1.3
            b(i,j) = 3.4
         end do
      end do

      return
      end subroutine init

      end module acc_data

      program test2

      use acc_data
      implicit none
      integer :: i, j

      call init

c$acc kernels loop present (a,b) copyout(c)
      do j = 1, NY
         do i = 1, NX
            c(i,j) = a(i,j)**b(i,j)
         end do
      end do
c$acc end kernels loop

      write(*,*)sum(c(1,:))
      stop
      end
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Wed Aug 01, 2012 2:47 pm    Post subject: Reply with quote

Hi Gopal,

Unfortunately, there are a few dangling OpenACC features yet to be implemented, including device_resident. The others being host data and last private. In the mean time, you'll need to reorganise your code a bit to use data regions instead.

For example:
Code:
% cat test2.f90
c
compile with: pgfortran -acc -Minfo=accel test2.f
c
      module acc_data

      integer, parameter :: NX = 100000, NY = 1000
      real, allocatable, dimension(:,:) :: a, b
      real, allocatable, dimension(:,:) :: c
cacc declare device_resident(a, b)
     
      contains

      subroutine alloc

      integer :: i, j
      allocate (a(NX,NY),b(NX,NY))
      allocate (c(NX,NY))

      end subroutine alloc
      subroutine init

      integer :: i, j
c$acc kernels loop present (a,b)
      do j = 1, NY
         do i = 1, NX
            a(i,j) = 1.3
            b(i,j) = 3.4
         end do
      end do

      return
      end subroutine init

      end module acc_data

      program test2

      use acc_data
      implicit none
      integer :: i, j

      call alloc
c$acc data create(A(NX,NY), b(NX,NY))
      call init

c$acc kernels loop present (a,b) copyout(c)
      do j = 1, NY
         do i = 1, NX
            c(i,j) = a(i,j)**b(i,j)
         end do
      end do

c$acc end data

      write(*,*)sum(c(1,:))
      stop
      end
% pgf90 -acc test2.f90 -Mfixed -Minfo=accel
init:
     26, Generating present(b(:,:))
         Generating present(a(:,:))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     27, Loop is parallelizable
     28, Loop is parallelizable
         Accelerator kernel generated
         27, !$acc loop gang ! blockidx%y
         28, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
             CC 1.0 : 12 registers; 96 shared, 8 constant, 0 local memory bytes
             CC 2.0 : 14 registers; 0 shared, 112 constant, 0 local memory bytes
test2:
     46, Generating local(b(:100000,:1000))
         Generating local(a(:100000,:1000))
     49, Generating present(b(:,:))
         Generating present(a(:,:))
         Generating copyout(c(:,:))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     50, Loop is parallelizable
     51, Loop is parallelizable
         Accelerator kernel generated
         50, !$acc loop gang ! blockidx%y
         51, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
             CC 1.0 : 16 registers; 136 shared, 112 constant, 0 local memory bytes
             CC 2.0 : 19 registers; 0 shared, 192 constant, 0 local memory bytes
% a.out
    2440.103   
Warning: ieee_inexact is signaling
FORTRAN STOP


Best Regards,
Mat
Back to top
View user's profile
Patnaik



Joined: 01 Aug 2012
Posts: 2

PostPosted: Wed Aug 01, 2012 3:12 pm    Post subject: Reply with quote

Mat,

Thanks, that is similar to a workaround I found. I was hoping not to have to explicitly list all the device variables in the main program, as the actual code will have hundreds. I guess I'll wait for the next update.

Also in your example, it seems that arrays a and b are first allocated on the host, something not really required, but makes sense if the code is to run on the host alone. I guess this is a good design practice?

Regards,
Gopal
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Wed Aug 01, 2012 3:44 pm    Post subject: Reply with quote

Quote:
I guess this is a good design practice?
I think so. One of the points of using directives is so you can turn them off. You could probably insert some logic in the code so that it would either way, but I don't think it would be worth it.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group