| View previous topic :: View next topic |
| Author |
Message |
alfvenwave
Joined: 08 Apr 2010 Posts: 77
|
Posted: Mon Apr 12, 2010 5:27 am Post subject: Hitting card memory limit crashes code and hangs card |
|
|
Hi.
I am having a problem with GPUs being left in a hung state. The code I am running deploys multiple GPUs within an openMP parallel loop. The code works fine for a single thread, however when running multiple cards I sometimes get a crash due to memory problems. This would be fine, except that often one or more cards get left in a hung state. Rebooting the machine fixes this, however the machine is part of a cluster so this is not a very satisfactory solution. Is there some way of restarting graphics cards from the command line if they end up in a jammed state?
Rob. |
|
| Back to top |
|
 |
dholt
Joined: 30 Jul 2008 Posts: 15 Location: The Portland Group Inc.
|
Posted: Mon Apr 12, 2010 10:51 am Post subject: |
|
|
Hi Rob,
If you're in Linux, you can try running (as root):
| Code: | | modprobe -vr nvidia |
to unload the driver, followed by:
to start it again.
You will need to kill off any running processes using the GPU before hand, and in some cases if something has the GPU tied up you won't be able to unload the driver. At that point the only solution I know of is to restart the system. |
|
| Back to top |
|
 |
alfvenwave
Joined: 08 Apr 2010 Posts: 77
|
Posted: Mon Apr 12, 2010 11:00 am Post subject: |
|
|
Great - thanks for this. I have managed to persuade our system administrator to give me root access for now. Bit annoying that they hang so often - is this common?
Rob. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Apr 12, 2010 11:11 am Post subject: |
|
|
While I don't know if it's common for others, my device hangs periodically as well. Like you, it occurs more often when I'm running code that hits a device limit or encounters some other error.
- Mat |
|
| Back to top |
|
 |
alfvenwave
Joined: 08 Apr 2010 Posts: 77
|
Posted: Tue Apr 13, 2010 3:45 am Post subject: |
|
|
Hi Mat. Bit annoying, but now I have root access I can reboot as is necessary. My next thread shows why I keep hitting problems....
Thanks,
Rob. |
|
| Back to top |
|
 |
|