PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Debugging acc kernel

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
AROM



Joined: 03 Apr 2013
Posts: 39

PostPosted: Mon Apr 29, 2013 10:02 am    Post subject: Debugging acc kernel Reply with quote

I have FORTRAN code.
I marked it with ACC directives. Launching application result in
Code:
call to cuMemFree returned error 700: Launch failed


cuda-memcheck shows huge amount of errors like
Code:
========= Invalid __global__ read of size 4
=========     at 0x0003cb58 in mp_thompson_837_gpu
=========     by thread (56,0,0) in block (0,29,0)
=========     Address 0x0c42c9fc is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x34b) [0x54b6b]
...


I failed to debug my application with cuda-gdb.
Is there any correct way to debug such application (acc kernel)?

boundary check test passed OK.

Alexey
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6120
Location: The Portland Group Inc.

PostPosted: Mon Apr 29, 2013 4:04 pm    Post subject: Reply with quote

Hi Alexey,

Yes, until we can get on device debugging supported these are difficult issues to determine. There are multiple points of failure here. It could be a problem with the original source, the generated CUDA kernel, or a problem with the back-end compiler.

When diagnosing these types of issues, I typically start with original code and use diagnostic flags such as -Mbounds, -Mchkptr, and -Mchkstk as well as run the program under Valgrind to check for memory issues.

The next step is to start commenting out parts of the accelerated code to determine the line number that the error occurs at. This sometimes gives an better indication of the problem. Next, I will keep the generated GPU code and look for errors associated with this line number. I will then also adjust the loop schedule to effect the generated kernel to see if this has an effect. Of course, if you do find that the compiler is generating bad CUDA code, we would appreciate you sending us a reproducing example.

If the generate CUDA code looks ok, my next step is to compile the back end code without optimization (-ta=nvidia,O0).

- Mat
Back to top
View user's profile
AROM



Joined: 03 Apr 2013
Posts: 39

PostPosted: Wed May 22, 2013 5:13 am    Post subject: Reply with quote

Hi Mat,

thank you for your advices. Unfortunately, I didn't find the error in the ACC kernel yet.

Is it possible to save kernel generated by PGI, modify, compile it with NVCC and substitute old kernel with modified one.

Alexey
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6120
Location: The Portland Group Inc.

PostPosted: Wed May 22, 2013 10:04 am    Post subject: Reply with quote

Hi Alexy,

Quote:
Is it possible to save kernel generated by PGI,
Yes, "-ta=nvidia,keepgpu"

Quote:
modify, compile it with NVCC and substitute old kernel with modified one.
Not really. It's possible but if you suspect PGI is generating bad CUDA code, the better thing to do is send a report and reproducing example to PGI Customer Service (trs@pgroup.com). If you ask them to forward the code to me, I'll take a look to see if I can determine the issue.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group