PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

How to operate variables on GPU

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
hzhcapricorn



Joined: 21 Mar 2011
Posts: 6

PostPosted: Sat Apr 09, 2011 1:27 am    Post subject: How to operate variables on GPU Reply with quote

I have two variables in GPU:
real, device, allocatable:: a_d(:,:), b_d(:,:)

which I used in the calculate use cufft. But I want to do a_d=b_d*2, but it said:

PGF90-S-0519-More than one device-resident object in assignment

Now I transfer b_d to another variable in host, multiple it by 2, and then pass it to a_b. But that cost too much time on data trasfer. Could I directly do that in GPU?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Mon Apr 11, 2011 11:41 am    Post subject: Reply with quote

Hi hzhcapricorn,

Device variables can only be operated on within device code so you either need to write a device kernel to perform the operation or use the PGI Accelerator Model directives to have the compiler write one for you.

Quote:
Now I transfer b_d to another variable in host, multiple it by 2, and then pass it to a_b. But that cost too much time on data trasfer. Could I directly do that in GPU?
Yes. Here's two examples:
Code:

module mul2_mod
   use cudafor
   real, device, allocatable:: a_d(:,:), b_d(:,:)
   real, allocatable:: a(:,:), b(:,:)

contains

   attributes(global) subroutine mymul2 (N,M)
      integer, value :: N,M
      integer :: idx, idy
      idx = (blockidx%x-1)*blockdim%x + threadidx%x
      idy = (blockidx%y-1)*blockdim%y + threadidx%x
      if (idx.le.N.and.idy.le.M) then 
        a_d(idx,idy)=b_d(idx,idy)*2.0
      endif
   end subroutine mymul2


end module mul2_mod

program mul2
   use mul2_mod
   integer :: N,M
   type(dim3) :: grid, block
   N=1024
   M=1024

   allocate(a(N,M), a_d(N,M))
   allocate(b(N,M), b_d(N,M))
   grid = dim3(N/16,M/16,1)
   block = dim3(16,16,1)

   b=1.0
   b_d=b
   call mymul2<<<grid,block>>>(N,M)
   a=a_d
   print *, a(1,1)

end program mul2
% pgf90 mul2.cuf ; a.out
    2.000000   

% cat mul2_acc.cuf

Code:

program mul2

   real, device, allocatable:: a_d(:,:), b_d(:,:)
   real, allocatable:: a(:,:), b(:,:)

   allocate(a(1024,1024), a_d(1024,1024))
   allocate(b(1024,1024), b_d(1024,1024))

   b=1.0
   b_d=b
!$acc region
   a_d=b_d*2
!$acc end region
   a=a_d
   print *, a(1,1)

end program mul2
   
  % pgf90 -ta=nvidia,cuda3.2 -Minfo=accel mul2_acc.cuf ; a.out
mul2:
     14, Loop is parallelizable
         Accelerator kernel generated
         14, !$acc do parallel, vector(16) ! blockidx%x threadidx%x
             !$acc do parallel, vector(16) ! blockidx%y threadidx%y
    2.000000   




Hope this helps,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group