PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

3d array problem compiling

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
mmaurier



Joined: 09 Sep 2009
Posts: 1

PostPosted: Fri Apr 02, 2010 1:52 pm    Post subject: 3d array problem compiling Reply with quote

running on Ubuntu 9.0.4, CUDA 2.3, Nvidia driver 190.53, card GeForce 9800 gx2,
compiling error

mmaurier@mmaurier-desktop:~/src/vcg/vcg-float/cuda03$ pgf90 -O3 -Mcuda test02.f90
NOTE: your trial license will expire in 12 days, 7.24 hours.
NOTE: your trial license will expire in 12 days, 7.24 hours.
/tmp/pgcudaforGP7eGFJZMPTs.gpu(26): error: identifier "pgf90_dev_common" is undefined

1 error detected in the compilation of "/tmp/pgnvdkQ7eEkrwubFO.nv0".
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0 (test02.f90: 21)
PGF90/x86-64 Linux 10.3-0: compilation aborted

I only have 12 day to test this code...
Thank you
Code:
module vars

   integer, parameter :: NR=50

end module vars

module kernel
  use cudafor

contains

attributes(global) subroutine setArray_kernel(thing)
   real, device, dimension(50,50,50) :: thing
   integer,device :: it,yt,zt

   it = threadidx%x + (blockidx%x-1) * blockdim%x
     yt = threadidx%y + (blockidx%y-1) * blockdim%y
     zt = threadidx%z + (blockidx%z-1) * blockdim%z
     thing(it,yt,zt) = 100.0

end subroutine setArray_kernel

end module kernel

! ----------------------------------------------------------------

subroutine setArray(x)

     use cudafor   
    use kernel
     type(dim3) :: dimGrid, dimBLock
     real, dimension(50,50,50) :: x
     real, allocatable, device, dimension(:,:,:) :: a
     integer nt,ng     

     write(0,*) "about allocate"
     allocate(a(50,50,50))
     write(0,*) "after allocate"
   ! .. 100 threads
     dimGrid =  dim3(5,5,5)
     dimBlock = dim3(10,10,10)
     write(0,*) "before a=x"
     a= x
     write(0,*) "after a=x"
     call setArray_kernel<<<dimGrid,dimBlock>>> (a)
     write(0,*) "after kernel call"
     i = cudathreadsynchronize()
     write(0,*) "about to do x=a"
     x= a
     deallocate(a)

end subroutine setArray

! ----------------------------------------------------------------

program test
   use vars
     real ha(NR,NR,NR)

     ha=9
     print *,ha(1,1,1),ha(2,2,2), ha(50-1,1,1),ha(50,2,2)
     call setArray(ha)
     print *,ha(1,1,1),ha(2,2,2), ha(50-1,1,1),ha(50,2,2)
     stop
end program
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Fri Apr 02, 2010 3:41 pm    Post subject: Reply with quote

Hi mmaurier,

I see two issues here. The first one is a compiler error when using "blockidx%z" causing the undefined identifier error. I sent a report to our engineers (TPR#16806).

The second issue is that you're using more then the maximum number of threads. While the max varies from card to card, on my Telsa the max is 512. However, you're using 1000 (i.e. "dimBlock = dim3(10,10,10)").

Because of these issue, I would suggest having the threads map to the first dimensions of the array and then have each thread loop through the z dimension. For example:
Code:

% cat thing.cuf
module vars

   integer, parameter :: NR=50

end module vars

module kernel
  use cudafor

contains

attributes(global) subroutine setArray_kernel(thing)
   real, device, dimension(50,50,50) :: thing
   integer, device :: it,yt,zt

   it = threadidx%x + (blockidx%x-1) * blockdim%x
     yt = threadidx%y + (blockidx%y-1) * blockdim%y
     do zt=1,50
        thing(it,yt,zt) = 100.0
     enddo

end subroutine setArray_kernel

end module kernel

! ----------------------------------------------------------------

subroutine setArray(x)

     use cudafor
     use kernel
     type(dim3) :: dimGrid, dimBLock
     real, dimension(50,50,50) :: x
     real, allocatable, device, dimension(:,:,:) :: a
     integer nt,ng

     write(0,*) "about allocate"
     allocate(a(50,50,50))
     write(0,*) "after allocate"
   ! .. 100 threads
     dimGrid =  dim3(5,5,1)
     dimBlock = dim3(10,10,1)
     write(0,*) "before a=x"
     a= x
     write(0,*) "after a=x"
     call setArray_kernel<<<dimGrid,dimBlock>>> (a)
     write(0,*) "after kernel call"
     i = cudathreadsynchronize()
     write(0,*) "about to do x=a"
     x= a
    deallocate(a)

end subroutine setArray

! ----------------------------------------------------------------

program test
   use vars
     real ha(NR,NR,NR)

     ha=9
     print *,ha(1,1,1),ha(2,2,2), ha(50-1,1,1),ha(50,50,50)
     call setArray(ha)
     print *,ha(1,1,1),ha(2,2,2), ha(50-1,1,1),ha(50,50,50)
     stop
end program
% pgf90 thing.cuf -o thing.out
% thing.out
    9.000000        9.000000        9.000000        9.000000
 about allocate
 after allocate
 before a=x
 after a=x
 after kernel call
 about to do x=a
    100.0000        100.0000        100.0000        100.0000
FORTRAN STOP


Hope this helps,
Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Tue May 04, 2010 4:24 pm    Post subject: Reply with quote

Hi mmaurier,

FYI, I've verified that TPR#16806 will be fixed in release 10.5.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group