|
| View previous topic :: View next topic |
| Author |
Message |
KarlW
Joined: 12 Jan 2009 Posts: 23
|
Posted: Thu Jul 30, 2009 1:03 am Post subject: Using multiple GPUs |
|
|
Hi,
I'm trying to run different code simultaneously on 2 GPUs. I'm under the impression that this requires OpenMP but I can't seem to get the code to work.
pgaccelinfo is picking up both devices and i am using the most recent CUDA drivers etc.
The code at the bottom of the post results in the following messages at run time and then freezes:
call gpu code
number of threads: 2
Section 1, thread: 0
test 1
number of threads: 2
Section 2, thread: 1
no devices found, exiting
launch kernel file=gpu_xyzint_1_openmptest.f90 function=gpu_xyzint_1 line=969 grid=1 block=15
!$OMP PARALLEL SHARED(pint,qint,rint)
tid = OMP_GET_THREAD_NUM()
if (tid.eq.0) then
nthreads = OMP_GET_NUM_THREADS()
end if
print *, 'number of threads:',nthreads
!$OMP SECTIONS
!$OMP SECTION
print *, 'Section 1, thread:', OMP_GET_THREAD_NUM()
print *, 'test 1'
call acc_set_device_num(0,acc_device_default)
!$acc region
!$acc do
do i=1,15
pint(i) = 0
qint(i) = 0
rint(i) = 0
end do
!$acc end region
! call gpucode(ngpu,lgpu)
!$OMP SECTION
print *, 'Section 2, thread:', OMP_GET_THREAD_NUM()
!$acc region
call acc_set_device_num(1,acc_device_default)
!$acc do
do i=16,31
pint(i) = 0
qint(i) = 0
rint(i) = 0
end do
!$acc end region
! call gpucode(ngpu,lgpu)
!$OMP END SECTIONS NOWAIT
!$OMP END PARALLEL
As you can see, I have replaced the call to a separate accelerated subroutine with some simple code. Would the call work when used in this way?
Many thanks,
Karl |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Jul 30, 2009 2:02 pm Post subject: |
|
|
Hi Karl,
Unfortunately, support for using accelerator regions within OpenMP regions is not in 9.0 yet. (see http://www.pgroup.com/userforum/viewtopic.php?t=1490) We are actively working on adding this and, if all goes well, are expecting preliminary support in September's 9.0-4 monthly release.
Though, I not sure where the "no devices found, exiting" error is coming from. I worked up a small test case using your sample but get the error "libcuda.so not found, exiting". You're welcome to send me the code and I can see what's going on.
Thanks,
Mat |
|
| Back to top |
|
 |
KarlW
Joined: 12 Jan 2009 Posts: 23
|
Posted: Thu Jul 30, 2009 10:27 pm Post subject: |
|
|
Hi Mat,
I was initially getting the libcuda.so error but when I installed the latest cuda version it went away. I thought that issue arose from the installation of the second GPU though.
I've emailed the code to you.
Many thanks,
Karl
edit:
Is there any other way to run different !$acc regions simultaneously on different GPUs?
Also, are there examples anywhere on running a normal region on multiple GPUs?
Cheers! |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Fri Jul 31, 2009 1:36 pm Post subject: |
|
|
Hi Karlw,
| Quote: | | Is there any other way to run different !$acc regions simultaneously on different GPUs? |
You can use MPI.
| Quote: | | Also, are there examples anywhere on running a normal region on multiple GPUs? |
Besides MPI, and in the future OpenMP or pthreads, we do not support dividing an accelerator region across multiple devices. Though, this is an evolving model so it may be possible in the future.
- Mat |
|
| Back to top |
|
 |
TheMatt
Joined: 06 Jul 2009 Posts: 263 Location: Greenbelt, MD
|
Posted: Fri Aug 07, 2009 11:47 am Post subject: |
|
|
| mkcolg wrote: | Hi Karlw,
| Quote: | | Is there any other way to run different !$acc regions simultaneously on different GPUs? |
You can use MPI.
- Mat |
Okay, Mat, I have a question, now. How does one use MPI and !$acc together?
I currently have a big, big program that is MPI and I'm thinking of accelerating a small part of it way down in the code-tree that is 25-30% of the CPU time (and it should be fairly CUDA friendly, no intercommunication, etc.).
The CUDA testbed I'm using has 4 CPUs and a Tesla S1060 (= 4 GPUs). Thus, I have a nice one-to-one ratio. If I used the accelerator pragmas, and ran this mpirun -np 4, would it "automagically" have rank n use GPU n, or do I need to add additional logic to the code?
I'm assuming the latter, and so is there an example PGI has that shows how to do that? (Of course, I'm hoping for the former!)
Matt |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|