|
| View previous topic :: View next topic |
| Author |
Message |
AROM
Joined: 03 Apr 2013 Posts: 8
|
Posted: Mon Apr 29, 2013 10:02 am Post subject: Debugging acc kernel |
|
|
I have FORTRAN code.
I marked it with ACC directives. Launching application result in
| Code: | | call to cuMemFree returned error 700: Launch failed |
cuda-memcheck shows huge amount of errors like
| Code: | ========= Invalid __global__ read of size 4
========= at 0x0003cb58 in mp_thompson_837_gpu
========= by thread (56,0,0) in block (0,29,0)
========= Address 0x0c42c9fc is out of bounds
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame:/usr/lib/libcuda.so (cuLaunchKernel + 0x34b) [0x54b6b]
... |
I failed to debug my application with cuda-gdb.
Is there any correct way to debug such application (acc kernel)?
boundary check test passed OK.
Alexey |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Apr 29, 2013 4:04 pm Post subject: |
|
|
Hi Alexey,
Yes, until we can get on device debugging supported these are difficult issues to determine. There are multiple points of failure here. It could be a problem with the original source, the generated CUDA kernel, or a problem with the back-end compiler.
When diagnosing these types of issues, I typically start with original code and use diagnostic flags such as -Mbounds, -Mchkptr, and -Mchkstk as well as run the program under Valgrind to check for memory issues.
The next step is to start commenting out parts of the accelerated code to determine the line number that the error occurs at. This sometimes gives an better indication of the problem. Next, I will keep the generated GPU code and look for errors associated with this line number. I will then also adjust the loop schedule to effect the generated kernel to see if this has an effect. Of course, if you do find that the compiler is generating bad CUDA code, we would appreciate you sending us a reproducing example.
If the generate CUDA code looks ok, my next step is to compile the back end code without optimization (-ta=nvidia,O0).
- Mat |
|
| Back to top |
|
 |
AROM
Joined: 03 Apr 2013 Posts: 8
|
Posted: Wed May 22, 2013 5:13 am Post subject: |
|
|
Hi Mat,
thank you for your advices. Unfortunately, I didn't find the error in the ACC kernel yet.
Is it possible to save kernel generated by PGI, modify, compile it with NVCC and substitute old kernel with modified one.
Alexey |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed May 22, 2013 10:04 am Post subject: |
|
|
Hi Alexy,
| Quote: | | Is it possible to save kernel generated by PGI, | Yes, "-ta=nvidia,keepgpu"
| Quote: | | modify, compile it with NVCC and substitute old kernel with modified one. | Not really. It's possible but if you suspect PGI is generating bad CUDA code, the better thing to do is send a report and reproducing example to PGI Customer Service (trs@pgroup.com). If you ask them to forward the code to me, I'll take a look to see if I can determine the issue.
- Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|