PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

About two or more GPUs
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
Teslalady



Joined: 16 Mar 2012
Posts: 75

PostPosted: Fri Mar 30, 2012 6:22 pm    Post subject: About two or more GPUs Reply with quote

Does the compiler support two or more GPUs in the same program?

we find the answer in your website that current release does not include support to automatically control two or more GPUs from the same accelerator region.
Because the system has four GPUs, when will the new release support more GPUs in the same program?
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 317
Location: Greenbelt, MD

PostPosted: Mon Apr 02, 2012 7:28 am    Post subject: Reply with quote

The compiler support more than one GPU just fine. I often run on 32+. The only thing you need to do is set the device in a logical way and use another MP framework on top.

For example, I use MPI to partition work and then, in each MPI process, do some 'mod 2' math to make sure process 0 uses gpu 0 and process 1 uses gpu 1, say. You can do this with either CUDA API calls with CUDA Fortran or acc_set_device_num (and associated Runtime Library Routines) with the accelerator pragmas (and with OpenACC too, I think, as of 12.3).

Matt
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Mon Apr 02, 2012 8:42 am    Post subject: Reply with quote

Quote:
does not include support to automatically control two or more GPUs from the same accelerator region.
Multiple needs to be performed using a CPU parallel model such as OpenMP or MPI. The complexity of discrete memories makes it impractical for automatic decomposition across multi-GPUs.

I've written two articles on Multi-GPU programming that you may find helpful. The first use CUDA Fortran, Multi-GPU Programming Using CUDA Fortran, MPI, and GPUDirect, and the second uses the PGI Accelerator Model, 5x in 5 Hours: Porting a 3D Elastic Wave Simulator to GPUs Using PGI Accelerator . Both programs use MPI since I was targeting Clusters, but I also find using MPI easier when working with multiple GPUs.

- Mat
Back to top
View user's profile
Teslalady



Joined: 16 Mar 2012
Posts: 75

PostPosted: Mon May 14, 2012 6:49 am    Post subject: Reply with quote

Hi,Mat, I have two GPUs in one system,one is Quadro 4000 and the other is Tesla C2050. when I run pgaccelinfo ,I got the information of these two GPUs,but if I want to know which gpus accelerate my code, how can I do ? add -minfo flag?

I use -minfo flag and get the below information:

[zhanghw@localhost openacc]$ pgfortran -o f2a.exe acc_f2a.f90 -acc -Minfo=accel -fast
NOTE: your trial license will expire in 13 days, 13.4 hours.
NOTE: your trial license will expire in 13 days, 13.4 hours.
main:
27, Generating copyin(a(1:n))
Generating copyout(r(1:n))
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
28, Loop is parallelizable
Accelerator kernel generated
28, !$acc loop gang, vector(256) ! blockidx%x threadidx%x
CC 1.0 : 12 registers; 56 shared, 112 constant, 28 local memory bytes; 66% occupancy
CC 2.0 : 15 registers; 4 shared, 136 constant, 4 local memory bytes; 100% occupancy

but my two GPUs' CC shoudld be 2.0 both
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Mon May 14, 2012 9:06 am    Post subject: Reply with quote

Hi Teslalady,

In order to have your program use both GPUs, you will need to use a higher level parallel model, OpenMP or MPI. To select which GPU a binary uses, you must either call the OpenACC runtime routine "acc_set_device_num" from within your program or set the environment flag "ACC_DEVICE_NUM" to the device you wish to use. If neither are set, the default is to use device 0.

Quote:
but my two GPUs' CC shoudld be 2.0 both
Since the build can be done on a system different then where it is run, the compiler does not use information about the GPUs attached to build system. Instead, it generates multiple embedded device binaries, in this case for compute capability 1.1 and 2.0. At runtime, the appropriate binary will be used. If you know that this binary will never be run any other devices and don't want the small amount of size in your binary to store the 1.1 version, then add the flag "-ta=nvidia,2.0" to only target a CC2.0 device.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group