PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

assignment (device->host) performance issue

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Tue Feb 15, 2011 1:05 pm    Post subject: assignment (device->host) performance issue Reply with quote

Suppose I have two arrays
Quote:

arr_host

arr_dev

one reside on host and one reside on device. If I do copy assignment like
Code:
arr_host = arr_dev

there is no performance penalty (with slightly different between arr_host on regular space or pinned memory space). In my code, the data is 22MB each copy so it takes 10min (vs 13min on pinned memory). However, if i specify the index

Code:
arr_host = arr_dev(1:sizeof(arr_host))


or

Code:
arr_host = arr_dev(padding+1:)
// given that arr_dev was allocated bigger than arr_host


there is a dramatically performance difference (about 3 times slower). So, I think PGI should revise the copy assignment

Thanks,
Tuan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Tue Feb 15, 2011 5:36 pm    Post subject: Reply with quote

Hi Tuan,

We're aware of this issue and are making progress.

- Mat
Back to top
View user's profile
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Wed Feb 16, 2011 8:27 am    Post subject: Reply with quote

mkcolg wrote:
Hi Tuan,

We're aware of this issue and are making progress.

- Mat


Any work-around solution by now Mat?

Tuan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Wed Feb 16, 2011 11:48 am    Post subject: Reply with quote

Hi Tuan,
Quote:
Any work-around solution by now Mat?

Avoid using array sections and only copy the entire array.

Using array sections forces the compiler to generate multiple copies since there isn't a general way at compile time to know the best method to copy the data. It's a very difficult problem since array sections can be defined by any number of expressions that can only be evaluated at runtime. What we're working on now is a way to determine at runtime the optimal way to copy the data. Finding a general solution will take some time.

- Mat
Back to top
View user's profile
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Wed Feb 16, 2011 2:53 pm    Post subject: Reply with quote

how about using CUDA API: cudaMemCpy() or related ones.
does it have the same problem?

Thanks,
Tuan
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group