PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Trouble Getting Started CUDA/PGI Fortran
Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Thu Nov 01, 2012 2:16 pm    Post subject: Reply with quote

You can FTP it to us. Send a note to customer service asking for instructions.

- Mat
Back to top
View user's profile
Dolf



Joined: 22 Mar 2012
Posts: 127

PostPosted: Thu Nov 01, 2012 2:38 pm    Post subject: RE: Reply with quote

here is the code:

module kernels
use cudafor
implicit none

contains

attributes (global) subroutine mult_kernel (a,b,c,nx,ny)

implicit none
integer, value :: nx,ny
integer :: i,j,k
real(8) :: sum
real(8) :: a(nx,ny),b(nx,ny),c(nx,ny)

i = (blockidx%x - 1) * blockDim%x + threadidx%x
j = (blockidx%y - 1) * blockDim%y + threadidx%y
print*, 'nx'
if(i <= nx .AND. j <= ny) then
sum = 0
do k=1,ny
sum = sum + (a(i,k) * b(k,j))
enddo
c(i,j) = sum
endif

end subroutine mult_kernel

end module kernels


program prog

use kernels
implicit none
integer :: istat
real(8), device :: cDev(nx,ny)
real(8) :: c(nx,ny)
type(dim3) :: grid,threads
integer :: nx,ny,nx4,ny4
real(8), device, allocatable, dimension (:,:) :: bearxDev,bearyDev,bearx4Dev,beary4Dev
nx = 306
ny = 306

nx4 = nx/4
ny4 = ny/4

allocate (bearxDev(nx,ny),bearyDev(nx,ny),bearx4Dev(nx4,ny4),beary4Dev(nx4,ny4), STAT=istat)
if (istat .ne. 0) write(*,*) 'error allocating bearxDev and bearyDev'
if (istat .eq. 0) write(*,*) 'allocating bearx and beary successful'
bearxDev(1:nx,1:ny) = 3.0
bearyDev(1:nx,1:ny) = 2.0
write(*,*) 'assignment of bearx and beary successful'
threads = dim3(32,16,1)
grid = dim3 (ceiling(real(nx)/threads%x),&
ceiling(real(ny)/threads%y),1)
call mult_kernel<<<grid,threads>>>(bearxDev,bearyDev,cDev,nx,ny)
istat = cudagetlasterror()
write(*,*) 'cudalasterror =' , istat
istat = cudaThreadSynchronize()

c(1:nx,1:ny) = cDev
deallocate(bearxDev,bearyDev,bearx4Dev,beary4Dev)
write(*,*) 'c(306,306) = ' ,c(nx,ny)
write(*,*) 'the end'
end program prog

here is what I get when I run in release:

0: ALLOCATE: 0 bytes requested; not enough memory: 0(no error)
Press any key to continue . . .
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Mon Nov 05, 2012 3:39 pm    Post subject: Reply with quote

The problem is with how you declare "cDev". It needs to be an allocatable or an automatic with parameter values to declare the size. As you have it now, nx and ny are uninitialized variables hence cDev's size is wrong.

To fix:
Code:

integer :: istat
integer, parameter :: nx=306,ny=306
real(8), device :: cDev(nx,ny)
real(8) :: c(nx,ny)
type(dim3) :: grid,threads
integer :: nx4,ny4
real(8), device, allocatable, dimension (:,:) :: bearxDev,bearyDev,bearx4Dev,beary4Dev


Also watch how you're accessing the "b" array. You declare it as "b(nx,ny)" but access it as if it were declared "b(ny,nx)". It happens to work since nx and ny are the same, but will cause problems if nx .ne. ny.

- Mat
Back to top
View user's profile
Dolf



Joined: 22 Mar 2012
Posts: 127

PostPosted: Tue Nov 06, 2012 12:49 pm    Post subject: RE: Reply with quote

Hi Mat,

thanks for the reply. Is there a size limit on modules in Cuda fortran??
if yes, what is that limit?
how can I find the size of a matrix in mega bytes??

thanks,
Dolf
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Tue Nov 06, 2012 4:38 pm    Post subject: Reply with quote

Quote:
Is there a size limit on modules in Cuda fortran??
Do you mean is there a limit on the size of arrays? You're limited to the amount of memory on your device. If an individual array is >2GB you need to add the flag "-Mlarge_arrays", a compute capable 2.0 device and use CUDA 4.2 or later.

As far as code size, there probably is some practical limit but I'm not sure what it would be. I've seen a 4000 line kernel before so they can get big.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next
Page 4 of 7

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group