PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

performance of PGI openacc directives
Goto page Previous  1, 2
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Fri Jan 11, 2013 11:56 am    Post subject: Reply with quote

Quote:
Do you have a sample code on this trick?
Not off hand. This is all done "under the hood" using CUDA and not something exposed at the user level.

- Mat
Back to top
View user's profile
sjz



Joined: 09 Jan 2013
Posts: 9

PostPosted: Mon Jan 14, 2013 4:30 pm    Post subject: cuda fortran kernel setup time? Reply with quote

Hi,

I tried vector addition in pgi cuda fortran and got the first run with ~0.1 second overhead. After that first call, the rest of calls did not see the cost of ~0.1 second. So this kernel setup overhead for the first call is true for cuda fortran as well as openacc. Is that right?


SJZ
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Jan 14, 2013 4:56 pm    Post subject: Reply with quote

Quote:
So this kernel setup overhead for the first call is true for cuda fortran as well as openacc. Is that right?
Correct. This is the cost to create the device context which would be the same for CUDA and OpenACC.

- Mat
Back to top
View user's profile
najafzha



Joined: 28 Dec 2012
Posts: 1

PostPosted: Wed Mar 06, 2013 10:07 am    Post subject: Reply with quote

The question is regarding the following comment in this post:

"For the copying of kernels, if the kernel is called multiple times in succession, then the cost to copy the kernel to the device occurs only once. However, if there are many other kernels in between calls, then there is the potential that the kernel code needs to be copied over again."

Is there any document that provides more detail on the behavior of kernel intializations and how they are copied/treated by CPU/GPU? "many other kernels in between calls", how many? Can user control the code and the environment such that the kernel copies can reside on GPU as long as necessary? Is GPU cache/shared memory is holding these copies? what the architecture looks like? To what extent can this environment (and Kernel Copying) be controllable?

Thanks in advance!
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Wed Mar 06, 2013 11:58 am    Post subject: Reply with quote

Quote:
Is there any document that provides more detail on the behavior of kernel intializations and how they are copied/treated by CPU/GPU?
Nothing from us since they can't be controlled by the user and it can change depending upon the target device and the underlying tools being used.

Quote:
"many other kernels in between calls", how many?
It's my understanding that if the kernel is in the device's kernel queue then it doesn't need to be reinitialized. Though, the length of the queue will vary by device. The arguments to the kernel will still need to be copied over to the device, it's just the kernel binary itself doesn't need to be copied.

Quote:
Can user control the code and the environment such that the kernel copies can reside on GPU as long as necessary?
Not that I'm aware of. There might be something in CUDA to have the kernel be persistent, but I'm not sure.

Quote:
Is GPU cache/shared memory is holding these copies?
No.

Quote:
what the architecture looks like?
Which architecture? This article is a few years old, but gives a high level view of Fermi and Tesla http://www.pgroup.com/lit/articles/insider/v2n1a5.htm.

Quote:
To what extent can this environment (and Kernel Copying) be controllable?
The end user can control the size and number of kernels created as well as if the kernels are launched asynchronously. However, you have no control over how kernels are copied to the device.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group