|
| View previous topic :: View next topic |
| Author |
Message |
King!
Joined: 06 Mar 2013 Posts: 1
|
Posted: Wed Mar 06, 2013 4:15 am Post subject: How to force synchronous send using OpenACC data construct |
|
|
My code keeps sending data asynchronously when transferring data from Host to device. I used the nvidia profiler and data is sent in 1MB chunks.
My data transfer clause is shown below before entering time loop
#pragma acc data copyin(U2[0:p.domain_size],U3[0:p.domain_size],source[0:nt], coef[0:five]), copy(U1[0:p.domain_size])
for (i=0; i<nt; i+=2) { // time loop
...
My code use arrays U1,U2,U3 for computation and output result is in U1 after time loop.
The copyin data is done async while copyout works sync. How do I make my code send data sync in one chunk so data movement can be efficient. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Fri Mar 08, 2013 3:44 pm Post subject: |
|
|
Hi King!
In 12.x, the OpenACC runtime copies the user data to a 1MB pinned buffer, then transfers that asynchronously. In 13.1-13.3, the OpenACC runtime pins the user memory, so should send the data in a single contiguous chunk. However, we have recently found cases where the pinning of the memory is more costly than copying it chunks. Hence, we may need to go back to using the pinned buffer instead, for at least some cases.
- Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|