PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

hyper-q
Goto page Previous  1, 2
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Fri May 10, 2013 9:02 am    Post subject: Reply with quote

Hi Jan,

The cuda proxy daemon is news to me, so I'd need to refer you to NVIDIA for more information. I'm with one of the commentors that was surprised you'd need to run this. It's my understanding that you didn't need to do anything special. Note that CUDA Fortran uses the same underlying mechanisms as CUDA C, so anything that applies to CUDA C will apply to CUDA Fortran.

Back in 2011, I did write an article on Mulit-GPU programming using MPI and CUDA Fortran. See: http://www.pgroup.com/lit/articles/insider/v3n3a2.htm. At the time I wrote the statement "setting up more than one Context on a single device is not supported." However this is now incorrect for Kelper given Hyper-Q allows multiple context. While this code isn't a good performance benchmark since it does so little work, you can use it to test multiple MPI processes attaching to a single device.

- Mat
Back to top
View user's profile
jand



Joined: 17 Aug 2008
Posts: 57

PostPosted: Mon May 13, 2013 11:36 am    Post subject: Reply with quote

Thanks Mat.

I also got a reply from Nvidia which may of interest to some. Apparently, the overlapping of MPI processes will be supported in CUDA 5.5 this summer.

-Jan

Quote from Ujval Kapasi:

You can do that now even on older HW, actually. Basically, you can run a different process on each core in your node, corresponding to different MPI ranks in your application. Each process can issue work (PCI transfers and computation) to the same GPU.

However, the older hardware and software will not overlap execution of items issued by different processes. These will be handled in serial.

However, HyperQ on K20 is better because it allows the hardware to overlap exectuion of items from different processes on the same node, when possible. In order to access that functionality on K20, you will need CUDA 5.5, which has not been released yet.

When CUDA 5.5 is released this summer, it will contain support for this. You will need to run a special server process to enable the functionality, and hence you will need system administrator priveledges on your node.

Ujval
Back to top
View user's profile
danah



Joined: 23 Nov 2009
Posts: 1

PostPosted: Fri Jan 17, 2014 12:54 pm    Post subject: Updated article (since 2011) or example code Reply with quote

Mat,

You mention Back in 2011, you wrote an article on Multi-GPU programming using MPI and CUDA Fortran. Do you have a more recent article or an example that runs with CUDA 5.5?

Thanks, dana
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Fri Jan 17, 2014 1:19 pm    Post subject: Reply with quote

Hi Dana,

Thanks for your interest, though no, sorry I haven't updated the article recently. However, the basic information is still valid and useful with CUDA 5.5.

At the time, MPI aware GPUDirect was just being implemented. I had planned on doing a follow-up article once GPUDirect became more mature and more GPUDirect NIC cards were available. However, in the last few years I've been focusing on OpenACC rather than CUDA Fortran so never got back to it. Let me talk with some of the other application engineers who do focus on CUDA Fortran and see if they can write a follow on article.

Is there something in particular that you're interested in learning how to do?

- Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Fri Jan 17, 2014 4:11 pm    Post subject: Reply with quote

Hi Dana,

I talked with Greg Ruetsch. He has a chapter on using MPI and CUDA Fortran, including GPUdirect, in his book CUDA Fortran for Scientists and Engineers including source code examples, that may be useful. He mention that the most difficult part is getting MVAPICH set-up correctly but there's README file in the code examples which explains this.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group