| View previous topic :: View next topic |
| Author |
Message |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Fri Jan 11, 2013 11:56 am Post subject: |
|
|
| Quote: | | Do you have a sample code on this trick? | Not off hand. This is all done "under the hood" using CUDA and not something exposed at the user level.
- Mat |
|
| Back to top |
|
 |
sjz
Joined: 09 Jan 2013 Posts: 6
|
Posted: Mon Jan 14, 2013 4:30 pm Post subject: cuda fortran kernel setup time? |
|
|
Hi,
I tried vector addition in pgi cuda fortran and got the first run with ~0.1 second overhead. After that first call, the rest of calls did not see the cost of ~0.1 second. So this kernel setup overhead for the first call is true for cuda fortran as well as openacc. Is that right?
SJZ |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Jan 14, 2013 4:56 pm Post subject: |
|
|
| Quote: | | So this kernel setup overhead for the first call is true for cuda fortran as well as openacc. Is that right? | Correct. This is the cost to create the device context which would be the same for CUDA and OpenACC.
- Mat |
|
| Back to top |
|
 |
najafzha
Joined: 28 Dec 2012 Posts: 1
|
Posted: Wed Mar 06, 2013 10:07 am Post subject: |
|
|
The question is regarding the following comment in this post:
"For the copying of kernels, if the kernel is called multiple times in succession, then the cost to copy the kernel to the device occurs only once. However, if there are many other kernels in between calls, then there is the potential that the kernel code needs to be copied over again."
Is there any document that provides more detail on the behavior of kernel intializations and how they are copied/treated by CPU/GPU? "many other kernels in between calls", how many? Can user control the code and the environment such that the kernel copies can reside on GPU as long as necessary? Is GPU cache/shared memory is holding these copies? what the architecture looks like? To what extent can this environment (and Kernel Copying) be controllable?
Thanks in advance! |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Mar 06, 2013 11:58 am Post subject: |
|
|
| Quote: | | Is there any document that provides more detail on the behavior of kernel intializations and how they are copied/treated by CPU/GPU? | Nothing from us since they can't be controlled by the user and it can change depending upon the target device and the underlying tools being used.
| Quote: | | "many other kernels in between calls", how many? | It's my understanding that if the kernel is in the device's kernel queue then it doesn't need to be reinitialized. Though, the length of the queue will vary by device. The arguments to the kernel will still need to be copied over to the device, it's just the kernel binary itself doesn't need to be copied.
| Quote: | | Can user control the code and the environment such that the kernel copies can reside on GPU as long as necessary? | Not that I'm aware of. There might be something in CUDA to have the kernel be persistent, but I'm not sure.
| Quote: | | Is GPU cache/shared memory is holding these copies? | No.
| Quote: | | what the architecture looks like? | Which architecture? This article is a few years old, but gives a high level view of Fermi and Tesla http://www.pgroup.com/lit/articles/insider/v2n1a5.htm.
| Quote: | | To what extent can this environment (and Kernel Copying) be controllable? | The end user can control the size and number of kernels created as well as if the kernels are launched asynchronously. However, you have no control over how kernels are copied to the device.
- Mat |
|
| Back to top |
|
 |
|