PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Performance of OpenACC data create
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
odlantern



Joined: 30 Aug 2010
Posts: 17

PostPosted: Wed Jun 19, 2013 3:10 pm    Post subject: Performance of OpenACC data create Reply with quote

I've noticed that the data create(...) clause is very slow, nearly identical to data copyin(...) and in some cases it is even slower. This is tested separately on a Geforce 2000M and also a Geforce Titan Windows 7 with PGI 13.6 64bit using the CUDA 5 runtime.

I'm a bit surprised by this because from what I understand the create clause should just be allocating memory on the GPU without any data copies (as per the OpenACC specification page 17) whereas the copyin clause both allocates memory _and_ copies data from the CPU to the GPU. The size of the arrays I'm allocating is around 50-100MB.

Why does the create clause take so long and is it possible to get around this somehow? Right now, this is the major bottleneck of my application.

Thanks.
~David
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Wed Jun 19, 2013 3:14 pm    Post subject: Reply with quote

Hi David,

Create shouldn't take longer than copyin so I would need an example to understand what's going on. Is this code you can share or can you create a reproducing example? Is so, please either post or send to PGI Customer Support (trs@pgroup.com) and ask them to send it to me.

Thanks,
Mat
Back to top
View user's profile
odlantern



Joined: 30 Aug 2010
Posts: 17

PostPosted: Wed Jun 19, 2013 3:40 pm    Post subject: Reply with quote

At some point, I'll try to come up with a single example that I can send.

I dug into this a little further using the Nsight profiler/trace tool and it looks like the create clause is ultimately calling these two CUDA routines for each array.

cuMemAlloc_v2

and

cuMemHostRegister

cuMemAlloc_v2 is very fast, but cuMemHostRegister is taking over 30x longer than cuMemAlloc_v2.

So it appears that this has to do with pagelocked pinned memory? Since I am not copy the data to or from the CPU, this seems like it is not needed.

~David
Back to top
View user's profile
odlantern



Joined: 30 Aug 2010
Posts: 17

PostPosted: Wed Jun 19, 2013 3:52 pm    Post subject: Reply with quote

Copyin calls

cuMemAlloc_v2
cuMemHostRegister
and
cuMemcpyHtoDAsync_v2

So copyin should take longer than create. It is possible that the cases I saw where copyin took a bit longer than create was a profiler/timing artifact. In most cases they are nearly identical indicating that cuMemHostRegister is taking up the vast majority of the time and overshadowing the timing of cuMemAlloc and cuMemcpyHtoDAsync.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Thu Jun 20, 2013 12:00 pm    Post subject: Reply with quote

Hi Dave,

Try setting the environment variable "PGI_ACC_SYNCHRONOUS=1".

In the 13.x compilers we updated the run time to use pinned memory by default (which in turn helps asynchronous data movement). However, in some cases it can slow down code because the NVIDIA routines that handle pinned memory can be quite a bit slower. Though, I mostly see the slow down when freeing the memory not the allocation, hence I'm not positive it the same issue.

We are in the process of revamping how we handle asynchronous data movement so if this the same issue, then hopefully we'll have it improved shortly.


- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group