PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

hyper-q
Goto page Previous  1, 2
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 5000
Location: The Portland Group Inc.

PostPosted: Fri May 10, 2013 9:02 am    Post subject: Reply with quote

Hi Jan,

The cuda proxy daemon is news to me, so I'd need to refer you to NVIDIA for more information. I'm with one of the commentors that was surprised you'd need to run this. It's my understanding that you didn't need to do anything special. Note that CUDA Fortran uses the same underlying mechanisms as CUDA C, so anything that applies to CUDA C will apply to CUDA Fortran.

Back in 2011, I did write an article on Mulit-GPU programming using MPI and CUDA Fortran. See: http://www.pgroup.com/lit/articles/insider/v3n3a2.htm. At the time I wrote the statement "setting up more than one Context on a single device is not supported." However this is now incorrect for Kelper given Hyper-Q allows multiple context. While this code isn't a good performance benchmark since it does so little work, you can use it to test multiple MPI processes attaching to a single device.

- Mat
Back to top
View user's profile
jand



Joined: 17 Aug 2008
Posts: 48

PostPosted: Mon May 13, 2013 11:36 am    Post subject: Reply with quote

Thanks Mat.

I also got a reply from Nvidia which may of interest to some. Apparently, the overlapping of MPI processes will be supported in CUDA 5.5 this summer.

-Jan

Quote from Ujval Kapasi:

You can do that now even on older HW, actually. Basically, you can run a different process on each core in your node, corresponding to different MPI ranks in your application. Each process can issue work (PCI transfers and computation) to the same GPU.

However, the older hardware and software will not overlap execution of items issued by different processes. These will be handled in serial.

However, HyperQ on K20 is better because it allows the hardware to overlap exectuion of items from different processes on the same node, when possible. In order to access that functionality on K20, you will need CUDA 5.5, which has not been released yet.

When CUDA 5.5 is released this summer, it will contain support for this. You will need to run a special server process to enable the functionality, and hence you will need system administrator priveledges on your node.

Ujval
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2002 phpBB Group