PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Obtaining source code from webinars (e.g. saxby)
Goto page Previous  1, 2
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
gayle



Joined: 04 Jul 2012
Posts: 7

PostPosted: Wed Jul 11, 2012 5:34 am    Post subject: Thank you. Reply with quote

Thanks Mat.

-Gayle
Back to top
View user's profile
gayle



Joined: 04 Jul 2012
Posts: 7

PostPosted: Mon Jul 16, 2012 12:46 pm    Post subject: codes in tar file Reply with quote

Forgive me, but I am new to some of the terminology. I was hoping to get the saxby code from the webinar, and if I look at the makefile, I do not see anything called saxby. Is "smoothing" the same thing as saxby?

Thanks.
-Gayle
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon Jul 16, 2012 2:42 pm    Post subject: Reply with quote

Hi Gayle,

These are the ones he used from his last OpenACC webinar/training he did for ISC. I asked him about Saxpy but he doesn't think he had a stand alone example for that one or if he did, he's not sure where it is now.

Hence, I went ahead and wrote up a very basic Saxpy example using OpenACC. Hopefully, I'm close to what he showed. Let me know if you need more info.

- Mat

Code:
% cat saxpy.f90
subroutine saxpy (A,X,Y,N)
   real(4) :: A, X(N), Y(N)
   integer :: N, i
!$acc kernels
   do i = 1,N
      X(i) = A * X(i) + Y(i)
   enddo
!$acc end kernels
end subroutine

program test

   real, allocatable, dimension(:) :: X, Y
   integer :: N
   real :: A, X1

   N=1024
   A=1.012
   allocate(X(N), Y(N))
   call random_seed()
   call random_number(X)
   call random_number(Y)
   print *, A, X(1), Y(1)
   X1=X(1)
   call saxpy(A,X,Y,N)
   print *, X(1), A*X1+Y(1)
   deallocate(X,Y)

end program test
   
% pgf90 saxpy.f90 -acc -Minfo -V12.6
saxpy:
      4, Generating copyin(y(:n))
         Generating copy(x(:n))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
      5, Loop is parallelizable
         Accelerator kernel generated
          5, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
             CC 1.0 : 8 registers; 48 shared, 0 constant, 0 local memory bytes
             CC 2.0 : 10 registers; 0 shared, 64 constant, 0 local memory bytes
% a.out
    1.012000       0.6050169       0.7534078   
    1.365685        1.365685   
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group