| View previous topic :: View next topic |
| Author |
Message |
MaciejG
Joined: 04 Jun 2013 Posts: 2
|
Posted: Tue Jun 04, 2013 4:42 pm Post subject: Using constant memory in Fortran CUDA and with multiple GPUs |
|
|
Hello,
I'm developing a program in Fortran CUDA and trying to use/access multiple GPUs from a single host thread. The original code (single GPU only) was using global and constant memory and when adding support for multiple GPUs I could not find a way to specify on which device to place Fortran variables with the "constant" attribute.
I have tried this:
integer, constant :: iconst
DO dev = 0, maxdev
ignore = cudaSetDevice(dev)
iconst = 1
END DO
and this compiles and runs but trying to access iconst from a kernel launched on the higher numbered devices results in "unspecified launch failure".
Is there a way to specify the placement of variables in the constant memory on specific device? I looked through the user manual and "CUDA Fortran for Scientists and Engineers" but there is little information on supporting multile-gpus in general.
Thanks,
Maciej |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Jun 05, 2013 9:38 am Post subject: |
|
|
Hi Maciej,
Did you setup the Peer-To-Peer communication first? It's required in order to use GPUDirect.
My article on multi-GPU program using CUDA Fortran has a section on GPUDirect (part 4) http://www.pgroup.com/lit/articles/insider/v3n3a2.htm, including the set-up code. While I don't use constant memory in this example, I just went back tried adding some variables and it worked as expected. Though, if you continue to encounter issues, let me know and we can work through them,
- Mat |
|
| Back to top |
|
 |
MaciejG
Joined: 04 Jun 2013 Posts: 2
|
Posted: Wed Jun 05, 2013 4:40 pm Post subject: |
|
|
Hi Mat,
Thanks for your reply.
I'm not sure if GPUDirect is actually relevant to what I am trying to achieve. I understand GPUDirect is required if a kernel running on device 1 is trying to access constant memory on device 0 - is that correct?
What I am trying to do is to have kernel running on dev 0 access constant memory on dev 0 and kernels on dev 1 accessing constant memory on device 1. But it is not clear to me how to specify (we're talking Fortran here) that a variable declared with attribute(constant) is allocated in the constant memory of device 1 or 2 instead of device 0.
Is there a way of achieving this in Fortran CUDA (PGI 13.2 and CUDA5.0) or is it something that is not currently supported?
Cheers,
Maciej |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Jun 06, 2013 11:17 am Post subject: |
|
|
I believe there are actually multiple context created hence you need establish Peer-to-peer so you can manage them. Granted, I've only done a little work with using multiple GPUs from a single host thread, so there may be an better way, but using Peer-to-Peer seems to work.
Personally, I much prefer using MPI and then establish a single GPU context to each MPI process. Logically I find it easy to manage, cleaner in implementation, and scales better. Of course, you do what's best for your program.
- Mat |
|
| Back to top |
|
 |
|