PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Update Vs CopyIn

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
_sayan_



Joined: 07 Apr 2012
Posts: 29

PostPosted: Mon Jun 04, 2012 3:41 pm    Post subject: Update Vs CopyIn Reply with quote

Hello,

In my 3d-FD code, previously I had some copyins like this:

Code:

copyin(u0,u1,alpha,beta,gamma)


This was taking quite some time, so I decided to use mirror allocations (of u0,u1 and alpha), and update clauses to synchronize host-device. Now, my code looks like:
Code:

copyin(beta,gamma)
...
update device(u0)
...
update host(u1)
update host(beta)
...

I noticed a substantial improvement in performance. So, the question is - Would limiting the initial copyins-out statements in data region a better idea in favor of mirrored allocation and update statements to synch cpu-gpu?

Thank you,
Sayan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Tue Jun 05, 2012 9:46 am    Post subject: Reply with quote

Hi Sayan,

The "copyin" clause causes the variables to be copied at this point in the program. With "mirror", it's entirely up to the user to copy data via the "update" clause. There is some overhead in allocating the data on the device, but overall there shouldn't be much performance difference between the two if structured the same. In this case, the two don't appear to be the same and it's these differences that is causing the increased performance.

What you can do is profile the code to see how and where the differences are. Set the environment variable "CUDA_PROFILE=1" before you run your code. This will create a CUDA profile file with the timings for every device call.

Alternately, you can use the PGI "pgcollect" utility to get a mixed host and device profile. However, pgcollect will aggregate the timings while the CUDA profiler will list out every call.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group