PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

To use atomic add
Goto page Previous  1, 2, 3
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Mike_Texnik



Joined: 10 Oct 2010
Posts: 16

PostPosted: Tue Jun 26, 2012 11:38 am    Post subject: Reply with quote

toepfer wrote:
Can you send/post the main program that calls this CUDA Fortran kernel and causes the program to crash as you showed below. Without being able to reproduce the issue, its hard to say what the problem is.

Thanks.


The main program is:
Code:

   program example27b
   use GPU_mod
   implicit real(4)(a-h,o-z)
   include "omp_lib.h"
   integer,parameter:: Nbin=10, Na=134217728
   integer i,j
   integer:: n,Ndev,dev,dist3(Nbin)
   real(4):: x(Na),dist(Nbin),dist2(Nbin),xBin(Nbin)
   integer, allocatable:: OffSet(:),StrSz(:)

    real(4), device, allocatable:: xD(:),xBinD(:)
   integer(4), device, allocatable:: dist3D(:)
    type(dim3) :: dimGrid, dimBlock

   allocate(dist3D(Nbin),xD(Na),xBinD(Nbin))

   count = 0
   Ndev=4
   call omp_set_num_threads(Ndev)

   nTr=16
    dimGrid = dim3( 16, 1, 1 )
    dimBlock = dim3( 16, 1, 1 )
   
   
   do i=1,Na
      call random_number(f)
      X(i)=f
   enddo
   
   dist=0.E0
   xBin=0.E0
   dx=1.E0/real(Nbin)
   do i=1,Nbin
      xBin(i)=(real(i)-0.5E0)*dx
   enddo
   
   start_time=omp_get_wtime()
   do i=1,Na
      ij=int(X(i)/dx)+1
      if (ij.eq.Nbin+1) print *,X(i)
      dist(ij)=dist(ij)+1.E0
   enddo

   end_time=omp_get_wtime()
   print *,"Sequential time= ",(end_time-start_time)
   
   start_time=omp_get_wtime()
 
    !$omp parallel private(i,ij,ii) reduction (+:dist2)
   
   dev=omp_get_thread_num()
    do ii=1,Na,Ndev
      i=ii+dev
      ij=int(X(i)/dx)+1
      dist2(ij)=dist2(ij)+1.E0
   enddo
   !$omp end parallel
   end_time=omp_get_wtime()
   print *,"OpenMP time= ",(end_time-start_time)   

   kol=0

   print *,' '

   start_time=omp_get_wtime()
   istat=cudaMemset(xD,0.0E0,Na)
   istat=cudaMemset(dist3D,0,Nbin)
   istat=cudaMemcpy(xD,x,Na,cudaMemcpyHostToDevice)
   end_time=omp_get_wtime()
   print *,"host to device copy time= ",(end_time-start_time)

   start_time=omp_get_wtime()
   call stat_kernel<<<dimGrid,dimBlock>>>(xD,dist3D,Na,Nbin,dx,nTr)
   end_time=omp_get_wtime()
   print *,"GPU time= ",(end_time-start_time)

   istat=cudaMemcpy(dist3,dist3D,Nbin,cudaMemcpyDeviceToHost)


   print *,' '
   deallocate(dist3D,xD,xBinD)

   end


I'm using GeForce GT635m and PGI fortan 11.7 on Win 7 x64
Back to top
View user's profile
toepfer



Joined: 04 Dec 2007
Posts: 50

PostPosted: Tue Jun 26, 2012 12:12 pm    Post subject: Reply with quote

Thanks! I will give it a try.
Back to top
View user's profile
toepfer



Joined: 04 Dec 2007
Posts: 50

PostPosted: Thu Jun 28, 2012 10:56 am    Post subject: Reply with quote

Found a couple of issue's with the global subroutine stat_kernel. First the setting tx=threadidx%x+1 is incorrect. Adding one is not necessary in CUDA Fortran. The threadidx values are one based, not zero based as they are in CUDA C. The second issue I found is how the loop bounds are being computed. Its possible to overflow a 32-bit integer value when computing Na*tx/nTr, given certain values of Na. There are a number of ways to fix this, for example, create a variable that is a 64-bit integer:

Code:
integer(8) :: ie


Then compute the value for the loop exit as follows:
Code:

   ie = Na
   ie = (ie * tx)/nTr

Then just use this variable in the do statement as follows:

Code:
    do i=Na*(tx-1)/nTr+1,ie
Back to top
View user's profile
Mike_Texnik



Joined: 10 Oct 2010
Posts: 16

PostPosted: Sat Jun 30, 2012 9:34 am    Post subject: Reply with quote

Dear toepfer, thank you very much for this reply, but could please tell me about the nature of this overflow? Why it does not affect on results by CPU? How can i diagnose this trouble in future?
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group