PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

APM PGI 10.5 - !$acc region
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Sun May 16, 2010 4:41 pm    Post subject: APM PGI 10.5 - !$acc region Reply with quote

I have a question.
Code:
!$acc region copyin(Cs), copyout(Ds)
Ds = Cs
DO i = 1, n
  DO j = 1, n
   ...
  ENDDO
ENDDO
!$acc end region

My question is whether the statement Ds=Cs is performed on Accelerator or not. If not, should I do something like

Code:
!$acc region copyin(Cs), copyout(Ds)
DO i = 1, n
  Ds(i,:) = Cs(i,:)
  DO j = 1, n
   ...
  ENDDO
ENDDO
!$acc end region


Thanks,
Tuan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon May 17, 2010 3:32 pm    Post subject: Reply with quote

Hi Tuan,

Since "Ds=Cs" is an implied do loop, it will be accelerated.

Code:
% cat test.f90
program test

real, dimension(1024,1024) :: Ds, Cs
integer :: i,j,n

n = 1024
Cs = 0.231

!$acc region copyin(Cs), copyout(Ds)
Ds = Cs
DO i = 1, n
  DO j = 1, n
     Ds(i,j) = Ds(i,j) * (i+j)
  ENDDO
ENDDO
!$acc end region

print *, Cs(1,1), Ds(1,1), Cs(1024,1024), Ds(1024,1024)

end program test

% pgf90 -ta=nvidia -Minfo=accel test.f90 -V10.5
test:
      9, Generating copyin(cs(:,:))
         Generating copyout(ds(:,:))
         Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
     10, Loop is parallelizable    <<<<< Implied Do loop for Ds=Cs
         Accelerator kernel generated
         10, !$acc do parallel, vector(16)
             CC 1.0 : 6 registers; 24 shared, 32 constant, 0 local memory bytes; 100 occupancy
             CC 1.3 : 6 registers; 24 shared, 32 constant, 0 local memory bytes; 100 occupancy
     11, Loop is parallelizable
     12, Loop is parallelizable
         Accelerator kernel generated
         11, !$acc do parallel, vector(16)
         12, !$acc do parallel, vector(16)
             CC 1.0 : 8 registers; 24 shared, 32 constant, 0 local memory bytes; 100 occupancy
             CC 1.3 : 8 registers; 24 shared, 32 constant, 0 local memory bytes; 100 occupancy


Hope this helps,
Mat
Back to top
View user's profile
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Tue May 18, 2010 8:41 am    Post subject: Reply with quote

Thanks, mat
I forgot to check the compiler's output

Tuan
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 317
Location: Greenbelt, MD

PostPosted: Tue May 18, 2010 8:46 am    Post subject: Reply with quote

Mat,

I know you knew this was coming: how can I get those nice cubin-like status messages out of pgfortran?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Wed May 19, 2010 9:17 am    Post subject: Reply with quote

Hi Matt,
Quote:

CC 1.0 : 6 registers; 24 shared, 32 constant, 0 local memory bytes; 100 occupancy
CC 1.3 : 6 registers; 24 shared, 32 constant, 0 local memory bytes; 100 occupancy

These are new in 10.5. We took your advice and added the output of "--ptxas-options=-v" to the "-Minfo=accel" messages.

Sorry, I should have updated your post to let you know.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group