PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

matrix multiplication with some modification

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
cyfengMIT



Joined: 07 Mar 2012
Posts: 24

PostPosted: Fri Oct 26, 2012 12:03 am    Post subject: matrix multiplication with some modification Reply with quote

Dear all:

I modified the example code of matrix multiplication in the file "CUDA Fortran Programming Guide and Reference" as below because I tried to apply for the arbitrary dimensions.
It's seemed something wrong when I tested with two matrices Adev(568,568) and Bdev(568, 2902).
There were always errors larger than 1.E-3...:(
The dimensions of grid and block were
Code:

dimGrid = dim3( (568-1)/16+1, (2902-1)/16+1, 1 )
dimBlock = dim3( 16, 16, 1 )


How should I modify my code?
Thank you in advance.

Feng

Code:

    attributes(global) subroutine gpu_cal_coef( Adev, Bdev, Cdev, NB, M, L)
    implicit none
       integer, value :: NB, M, L
       real*8, device :: Adev(NB,M), Bdev(M,L), Cdev(NB,L)
       integer, device :: i, j, kb, k, tx, ty
       real*8, shared :: Asub(16,16), Bsub(16,16)
       real*8, device :: Cij

! Start execution, first get my thread indices
       tx = threadidx%x
       ty = threadidx%y

! This thread computes C(i,j) = sum(A(i,:) * B(:,j))
       i = (blockidx%x-1)*16 + tx
       j = (blockidx%y-1)*16 + ty

       Cij = 0.d0

       do kb = 1, M, 16
          if (i<=NB .and. kb+ty-1<=M)then        !<--modification
            Asub(tx,ty) = Adev(i,kb+ty-1)
          else
            Asub(tx,ty) = 0.d0                            !<--modification
          end if
       
          if (kb+tx-1<=M .and. j<=L)then          !<--modification
            Bsub(tx,ty) = Bdev(kb+tx-1,j)
          else
            Bsub(tx,ty) = 0.d0                            !<--modification
          end if

          call syncthreads()

          do k = 1,16
             Cij = Cij + Asub(tx,k)*Bsub(k,ty)
          enddo
          call syncthreads()

       enddo
       Cdev(i,j) = Cij

   end subroutine gpu_cal_coef
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Fri Oct 26, 2012 3:11 pm    Post subject: Reply with quote

Hi Feng,

567/17=33 blocks. 33 block times 16 threads per block is only 528 elements.

Since this is integer division, if the number of elements is not evenly divisible by the number of threads in a block, you need to round up.

Code:
dimGrid = dim3( (568+15)/16, (2902+15)/16, 1 )


You then need to make sure you have guards which skip the excess threads (which it looks like you have).

- Mat
Back to top
View user's profile
cyfengMIT



Joined: 07 Mar 2012
Posts: 24

PostPosted: Fri Oct 26, 2012 7:20 pm    Post subject: Reply with quote

Hi, Mat

Thank you for reminding.
The number of block is (567/16)+1 = 36. There are 576 elements larger than 567.
I got no idea what happened. :(

Feng

mkcolg wrote:


567/17=33 blocks. 33 block times 16 threads per block is only 528 elements.

Back to top
View user's profile
cyfengMIT



Joined: 07 Mar 2012
Posts: 24

PostPosted: Tue Oct 30, 2012 6:37 am    Post subject: Reply with quote

Hi Mat,

I found the Cdev(i,j)=Cij should be guarded too. That is:
Code:

if( i<=NB .and. j<=L )then
  Cdev(i,j)=Cij
end if

All the errors are less than 1.E-6.

Feng
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group