PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

quick_GeForce680_x64.exe is not a valid win32 application.
Goto page Previous  1, 2, 3, 4
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Dolf



Joined: 22 Mar 2012
Posts: 128

PostPosted: Mon Sep 23, 2013 11:31 am    Post subject: RE: Reply with quote

Hi Mat,

I have GeForce680, I am compiling using CC 3.0. I just realized that the exit of the code is because of illegal memory access. Not sure why.

here is some of the methods I use to initiate memory, copy from device to host, host to device:

allocate(pDev(nx,ny),p1Dev(nx1,ny1),p2Dev(nx2,ny2),p3Dev(nx3,ny3),p4Dev(nx4,ny4), STAT=istat)
if (istat /= 0) print *, 'error initializing pDev matrix...'

copy from device to host memory:
p1(1:nx1,1:ny1) = p1Dev

copy from host to device:
pDev = p(1:nx,1:ny)

is that the correct way to do it?
Please advice if there is a better and safer way. I am just trying figure out what is the cause for uninitialized memory.

Dolf
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Mon Sep 23, 2013 11:43 am    Post subject: Reply with quote

Quote:
I just realized that the exit of the code is because of illegal memory access
Assuming the sizes are correct and the nx/ny variables are initialized, then this code looks fine. Though, an "illegal memory access" may be in your kernel. Did you remember to guard your array accesses so that if you launch a kernel with more threads then there are elements in the array, you don't have these threads access the arrays?

- Mat
Back to top
View user's profile
Dolf



Joined: 22 Mar 2012
Posts: 128

PostPosted: Mon Sep 23, 2013 11:50 am    Post subject: RE: Reply with quote

Quote:
Did you remember to guard your array accesses so that if you launch a kernel with more threads then there are elements in the array, you don't have these threads access the arrays?

I do not understand, can you give an example? I don't think I am using this technique yet.

Dolf
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Mon Sep 23, 2013 3:45 pm    Post subject: Reply with quote

In the following simple kernel, without the test to make sure the element being computed (i.e. "j") is less than or equal to the total number of elements ("n"), then if the total number of threads (set when the kernel is launched) is greater than "n", the code would get an access violation.

Having fewer threads is bad too since not all elements would be computed.

Code:
        attributes(global) subroutine stream_add(c, a, b, n)
          real*8, device :: c(*), a(*), b(*)
          integer, value :: n
          j = threadIdx%x + (blockIdx%x-1) * blockDim%x
          if (j .le. n) c(j) = a(j) + b(j)
          return
        end subroutine


- Mat
Back to top
View user's profile
Dolf



Joined: 22 Mar 2012
Posts: 128

PostPosted: Mon Sep 23, 2013 4:44 pm    Post subject: RE: Reply with quote

I applied this method with all my kernels, just like below:

here is how I call the kernel:
threads = dim3(16,16,1)
grid = dim3(ceiling(real(nx1)/threads%x),ceiling(real(ny1)/threads%y), 1)

call restrictPressure_kernel<<<grid,threads>>>(pDev,p1Dev,xrefDev,yrefDev,xref1Dev,yref1Dev,nx, ny, nx1,ny1,enclosingFineRectX1Dev,enclosingFineRectY1Dev)

istat = cudaThreadSynchronize()
if (istat .ne. 0 ) write(*,*) 'error restrictPressure kernel'

here is the kernel subroutine:
attributes (global) subroutine restrictPressure_kernel(fineMesh, coarseMesh, xrefFine, yrefFine, xrefCoarse, yrefCoarse,nxFine, nyFine, nxCoarse, nyCoarse, enclosingFineRectX, enclosingFineRectY)

implicit none
integer, value :: nxFine, nxCoarse, nyFine, nyCoarse
real(8) :: fineMesh(nxFine, nyFine), coarseMesh(nxCoarse, nyCoarse), &
xrefFine(nxFine), yrefFine(nyFine), xrefCoarse(nxCoarse), yrefCoarse(nyCoarse)
real(8) :: enclosingFineRectX(nxCoarse), enclosingFineRectY(nyCoarse)
integer :: xIndex, yIndex, i, j
real(8) :: length, height, b, c, xx, yy, H1, H2, H3, H4

i = (blockidx%x - 1) * blockDim%x + threadidx%x
j = (blockidx%y - 1) * blockDim%y + threadidx%y

if( i <= nxCoarse ) then
if ( j <= nyCoarse ) then
>> the rest of the subroutine here <<
end if
end if

as you can see I restricted the execution to only the correct threads.

hope I am doing it right.
Dolf
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3, 4
Page 4 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group