PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

OpenACC program takes two GPUs (instead of one)

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
xray



Joined: 21 Jan 2010
Posts: 85

PostPosted: Fri Jul 26, 2013 2:15 am    Post subject: OpenACC program takes two GPUs (instead of one) Reply with quote

Hello,
Since the 13.x compiler (we use currently 13.6), we have severe problems with executing OpenACC programs on our nodes with two GPUs (We have two NVIDIA Quadro 6000 (Fermi) GPUs in each node). The problem is that any arbitrary OpenACC program takes BOTH GPUs (instead of only one). If we start e.g. a Jacobi solver on one GPU (without setting any device number), it runs on device 0 AND device 1. The program does neither elaborate on this nor prints output twice. But, you can still see both executions with "nvidia-smi" (see below).

Code:
>$ nvidia-smi
Fri Jul 26 10:42:01 2013
+------------------------------------------------------+
| NVIDIA-SMI 4.310.40   Driver Version: 310.40         |
|-------------------------------+----------------------+----------------------+
| GPU  Name                     | Bus-Id        Disp.  | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage         | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro 6000              | 0000:02:00.0     Off |                    0 |
| 30%   78C    P0    N/A /  N/A |   6%  324MB / 5375MB |     79%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  Quadro 6000              | 0000:85:00.0      On |                    0 |
| 30%   74C    P0    N/A /  N/A |   2%   98MB / 5375MB |      0%   E. Process |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0     29538  ./laplace_openacc                                    362MB  |
|    1     29538  ./laplace_openacc                                    362MB  |
+-----------------------------------------------------------------------------+


Then the problem is that we have set our GPUs on the compute mode "exclusive process". This prohibts any other user to start a GPU program if one OpenACC program (running on both) is executed which is really bad for us.
We have the same problem with MPI programs from a single user. If we have an MPI program with two processes running on one node and each process should actually talk to one GPU (according to the rank number), it does not work: While initializing the first device (acc_init) it takes both GPUs so that the second process get an error and finishes with context error.

Do have any ideas how to get a workaround? Will this be fixed in the next compiler releases (it was not an issue with 12.9 for example)?
Thanks, Sandra
Code:
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Fri Jul 26, 2013 10:10 am    Post subject: Reply with quote

Hi Sandra,

Thanks for the report. I was able to reproduce the behavior and sent a report (TPR#19494) to engineering. They should have a fix in place in the near future.

In the mean time, the work around would be to remove the call to acc_init.

Best Regards,
Mat
Back to top
View user's profile
xray



Joined: 21 Jan 2010
Posts: 85

PostPosted: Tue Jul 30, 2013 3:55 am    Post subject: Reply with quote

Thanks Mat. The workaround is working for me.
Back to top
View user's profile
jtull



Joined: 30 Jun 2004
Posts: 438

PostPosted: Tue Sep 09, 2014 5:13 pm    Post subject: TPR 19494 is fixed in 14.9 Reply with quote

TPR 19494 - OACC: Using acc_init reserves all devices on a system
has been fixed in the 14.9 release of the PGI compilers.

thanks for your report.

dave
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group