PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Using cudaMemCheck

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Dolf



Joined: 22 Mar 2012
Posts: 105

PostPosted: Mon Nov 18, 2013 2:32 pm    Post subject: Using cudaMemCheck Reply with quote

Hi all,

I have mentioned in previous post about my fortran code generating NaN (not a n number) error in the middle if execution. I have used cudamemcheck tool to diagnose. I am not familiar with it so I am posting what I get when running memcheck on my executable code (Quick5.exe): 12 severe errors.
I am compiling the code using PGF 13.9 fortran compiler (and cuda toolkit 5.0) with micro-soft VS 2010.

========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaLaunch + 0x1a9) [0x234c9]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (kernels_getreynvarqnj_kernel_ + 0x2a0) [0x4a510]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x1dc0) [0x87a50]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= Invalid __local__ write of size 8
========= at 0x00000190 in kernels_getreynvarqnj_kernel_
========= by thread (4,12,0) in block (0,2,0)
========= Address 0x00fffc08 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuLaunchKernel + 0x1b2) [0xe042]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll [0x3706]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaLaunch + 0x1a9) [0x234c9]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (kernels_getreynvarqnj_kernel_ + 0x2a0) [0x4a510]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x1dc0) [0x87a50]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= Invalid __local__ write of size 8
========= at 0x00000190 in kernels_getreynvarqnj_kernel_
========= by thread (3,12,0) in block (0,2,0)
========= Address 0x00fffc08 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuLaunchKernel + 0x1b2) [0xe042]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll [0x3706]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaLaunch + 0x1a9) [0x234c9]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (kernels_getreynvarqnj_kernel_ + 0x2a0) [0x4a510]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x1dc0) [0x87a50]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= Invalid __local__ write of size 8
========= at 0x00000190 in kernels_getreynvarqnj_kernel_
========= by thread (2,12,0) in block (0,2,0)
========= Address 0x00fffc08 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuLaunchKernel + 0x1b2) [0xe042]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll [0x3706]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaLaunch + 0x1a9) [0x234c9]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (kernels_getreynvarqnj_kernel_ + 0x2a0) [0x4a510]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x1dc0) [0x87a50]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= Invalid __local__ write of size 8
========= at 0x00000190 in kernels_getreynvarqnj_kernel_
========= by thread (1,12,0) in block (0,2,0)
========= Address 0x00fffc08 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuLaunchKernel + 0x1b2) [0xe042]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll [0x3706]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaLaunch + 0x1a9) [0x234c9]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (kernels_getreynvarqnj_kernel_ + 0x2a0) [0x4a510]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x1dc0) [0x87a50]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= Invalid __local__ write of size 8
========= at 0x00000190 in kernels_getreynvarqnj_kernel_
========= by thread (0,12,0) in block (0,2,0)
========= Address 0x00fffc08 is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuLaunchKernel + 0x1b2) [0xe042]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll [0x3706]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaLaunch + 0x1a9) [0x234c9]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (kernels_getreynvarqnj_kernel_ + 0x2a0) [0x4a510]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x1dc0) [0x87a50]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= Program hit error 30 on CUDA API call to cudaThreadSynchronize
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuProfilerStop + 0xa0432) [0xbfc12]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaThreadSynchronize + 0x218) [0x1e1b8]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (cudathreadsynchronize_ + 0x12) [0xaa312]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x1dc8) [0x87a58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= Program hit error 30 on CUDA API call to cudaLaunch
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuProfilerStop + 0xa0432) [0xbfc12]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaLaunch + 0x2a5) [0x235c5]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (kernels_getreynvarak_kernel_ + 0x36e) [0x4a88e]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x207e) [0x87d0e]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= Program hit error 30 on CUDA API call to cudaThreadSynchronize
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuProfilerStop + 0xa0432) [0xbfc12]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaLaunch + 0x2a5) [0x235c5]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (kernels_getreynvarak_kernel_ + 0x36e) [0x4a88e]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x207e) [0x87d0e]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= Program hit error 30 on CUDA API call to cudaThreadSynchronize
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuProfilerStop + 0xa0432) [0xbfc12]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaThreadSynchronize + 0x218) [0x1e1b8]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (cudathreadsynchronize_ + 0x12) [0xaa312]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x2086) [0x87d16]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= Program hit error 30 on CUDA API call to cudaMemcpy
========= Saved host backtrace up to driver entry point at error
========= Host Frame:C:\Windows\SYSTEM32\nvcuda.dll (cuProfilerStop + 0xa0432) [0xbfc12]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\cudart64_50_35.dll (cudaMemcpy + 0x2ae) [0x27dae]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (pgf90_dev_copyout + 0x4c) [0xa727c]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (reyneq3_ + 0x21d3) [0x87e63]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (vcycle_ + 0x3c29) [0x98239]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (fullmult_ + 0x74d) [0x989fd]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (initcasepreadapt_ + 0x2e8) [0x6bb58]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (MAIN_ + 0x7ca4) [0x67954]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (main + 0x70) [0x10e0]
========= Host Frame:C:\Users\Dolf\Desktop\quick 5 test results\run\Quick5.exe (__tmainCRTStartup + 0x136) [0x11e6e6]
========= Host Frame:C:\Windows\system32\KERNEL32.DLL (BaseThreadInitThunk + 0x1a) [0x1832]
========= Host Frame:C:\Windows\SYSTEM32\ntdll.dll (RtlUserThreadStart + 0x21) [0x5d609]
=========
========= ERROR SUMMARY: 12 errors

any ideas? which ones are the 12 errors I need to fix?

thanks,
Dolf
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Tue Nov 19, 2013 6:07 pm    Post subject: Reply with quote

Hi Dolf,

The out-of-bounds errors are bad and should be fixed. Easiest thing to do would be to compile in emulation mode (-Mcuda=emu) and add bounds checking (-Mbounds). Hopefully this will show the same error and the exact spot where it occurs.

Quote:
Program hit error 30 on CUDA API call to cudaThreadSynchronize
I believe this means an out-of-bounds error in shared memory so may just be continuation of the same error.

- Mat
Back to top
View user's profile
Dolf



Joined: 22 Mar 2012
Posts: 105

PostPosted: Wed Nov 20, 2013 12:26 pm    Post subject: RE: Reply with quote

what does that mean?
how come I have that error even if I am applying checking for the right threads in the beginning of the kernel subroutine?


attributes (global) subroutine GetReynVarqnj_kernel(nx,ny,ndx,ndy, &
iqpo,p,hnew,hjmin,hjmax,cohjmx,s,l,kd,zdatLow,qndatLow)

implicit none
integer :: i, j, k
integer, value :: nx,ny,ndx,ndy,s,l,kd,iqpo
real(8) :: qnj(nx,ny)
real(8) :: zdatLow(s), qndatLow(s)
real(8) :: zdatMid(l), qndatMid(l)
real(8) :: zdatHigh(kd), qndatHigh(kd)
real(8) :: p(nx,ny)
real(8) :: hnew(ndx,ndy),hjmin(ndx,ndy),hjmax(ndx,ndy), &
cohjmx(ndx,ndy)
integer :: n(2)

i = (blockidx%x -1) * blockDim%x + threadidx%x
j = (blockidx%y -1) * blockDim%y + threadidx%y

n(1) = size(p,1) - 1
n(2) = size(p,2)

if (i .ge. 2 .AND. i .le. n(1) ) then
if (j .ge. 2 .AND. j .le. n(2) ) then

Quote:
Easiest thing to do would be to compile in emulation mode (-Mcuda=emu) and add bounds checking (-Mbounds).

I did this option in release mode with bounds check. But it take awfully long time to run (stuck in the middle). maybe I should do it in debug instead? should I run under cuda-memcheck? or just by itself??

Thanks,

Dolf
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Thu Nov 21, 2013 10:00 am    Post subject: Reply with quote

Quote:
what does that mean?
An out-of-bounds error means that you are accessing memory (either read or write) beyond the number of elements in the array. At best, this is benign, at worst this will give you wrong answers or cause memory access violation.

Quote:
how come I have that error even if I am applying checking for the right threads in the beginning of the kernel subroutine?
You have other arrays besides p, the out-of-bounds reference could be coming from one of these. Check the sizes of the other arrays (s, l, kd, ndx, ndy) and if they are being accessed out size these ranges.

Quote:
maybe I should do it in debug instead?
Debug mode uses emulation mode as well so would be no better. On device debugging will be available early next year.

Quote:
or just by itself??
You can go back to the old debug method, i.e. print statements. However, printing from devices isn't formatted so you can have the output from several threads intermixed. Though, that would be my next step.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group