| View previous topic :: View next topic |
| Author |
Message |
jand
Joined: 17 Aug 2008 Posts: 48
|
Posted: Wed Apr 10, 2013 3:57 pm Post subject: hyper-q |
|
|
Hi,
I am starting to use a K20 card and am wondering if there are any examples of how to use the hyper-q feature in cuda fortran.
My problem: I am running simulations with multiple MCMC chains (each chain is simulated by one MPI thread) in parallel and I would like the threads to access a single K20 card simultaneously.
Thanks, Jan |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5000 Location: The Portland Group Inc.
|
Posted: Thu Apr 11, 2013 9:37 am Post subject: |
|
|
Hi Jan,
Hyper-Q just expands the number of streams and contexts that the device can handle. So in CUDA there's nothing Hyper-Q specific, rather you just need to utilize the already existing streams construct and/or attach multiple host processes to a single device.
This article isn't Hyper-Q specific, but does give an overview of Asynchronous data movement and using stream. http://www.pgroup.com/lit/articles/insider/v3n1a4.htm
- Mat |
|
| Back to top |
|
 |
jand
Joined: 17 Aug 2008 Posts: 48
|
Posted: Thu May 09, 2013 4:19 pm Post subject: |
|
|
On the Nvidia Dev Zone, I found this:
In the CUDA 5.0 release:
#1 is supported and documented (http://docs.nvidia.com/cuda/kepler-tuning-guide/index.html#hyperq). There is also sample code in the simpleHyperQ example here: http://docs.nvidia.com/cuda/cuda-samples/index.html#advanced
#2 is supported on a few Cray-based systems (e.g. Titan) in the CUDA 5.0 release. We’re working on productizing (testing, documention, etc.) this feature for a wider range of hardware/software configuration in an upcoming release.
Does anyone have experience with #2 and CUDA Fortran?
Thanks, Jan |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5000 Location: The Portland Group Inc.
|
Posted: Thu May 09, 2013 5:04 pm Post subject: |
|
|
| Quote: | | Does anyone have experience with #2 and CUDA Fortran? |
See the article I noted above. Hyper-Q helps to better utilize streams so there's nothing new in how to program. You just may see better performance by using asynchronous kernels and multiple streams.
- Mat |
|
| Back to top |
|
 |
jand
Joined: 17 Aug 2008 Posts: 48
|
Posted: Thu May 09, 2013 6:36 pm Post subject: |
|
|
Hi Mat,
the first part of what I meant to quote got lost. So here it is:
"HyperQ refers to two related capabilities of the Tesla K20 and later GPUs:
1. concurrency, when possible, for kernels launched into different streams in the same process
2. concurrency, when possible, between kernels launched from different MPI ranks in different processes running in parallel on the same node."
( https://devtalk.nvidia.com/default/topic/529136/hyperq-and-mpi/ )
I think you refer to option 1. However, my understanding is that Hyper-Q can be utilized by multiple MPI threads without requiring any changes to existing code. Apparently this is done by running a cuda proxy server.
Thanks, Jan |
|
| Back to top |
|
 |
|