PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

PVF 13.7 can't compile the same codes
Goto page Previous  1, 2
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 6142
Location: The Portland Group Inc.

PostPosted: Thu Sep 05, 2013 2:52 pm    Post subject: Reply with quote

Quote:
But the results were wrong! Furthermore, the second error "invalid arguments" occurred while calling the "subroutine OKgrid_rain2" again. The error was thrown out while launching the kernel "gpu_assign_B2_matrix". I will try to figure out.
Ok, let us know if you need help.

- Mat
Back to top
View user's profile
cyfengMIT



Joined: 07 Mar 2012
Posts: 22

PostPosted: Fri Sep 06, 2013 2:54 am    Post subject: Reply with quote

Hi Mat,

As usual, I checked the data uploaded to the GPU device. The following way which was correct in the PVF 10.9, however, is wrong in the PVF 13.7.
Code:
dgy(1:ndem)=DEMdata(1:ndem)%Y

The array "dgy" is the device variable and the one "DEMdata" is the host variable. Is the issue similar to the one which had been corrected in PVF 12.5 ?(http://www.pgroup.com/userforum/viewtopic.php?p=11577&highlight=#11577)

Feng
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6142
Location: The Portland Group Inc.

PostPosted: Fri Sep 06, 2013 1:49 pm    Post subject: Reply with quote

Hi Feng,

It's possible that it's related since the last issue had to do with some optimization of data transfers with added around then. Please send in a reproducing example.

Though, I'm wondering if you really want to do it this way. The "Y" members are not contiguous hence each Y would need to be copied to the device separately. You might want to consider coalescing Y on the host and then send it over in one contiguous block?

- Mat
Back to top
View user's profile
cyfengMIT



Joined: 07 Mar 2012
Posts: 22

PostPosted: Fri Sep 06, 2013 7:27 pm    Post subject: Reply with quote

Hi Mat,

Yap! You are right. It doesn't make sense to expect the compiler automatically copying the whole non-contiguous data to the contiguous memory address on the device. I am modifying the code to coalesce Y on the host first. Thanks for your advice.

Feng

mkcolg wrote:
Hi Feng,

It's possible that it's related since the last issue had to do with some optimization of data transfers with added around then. Please send in a reproducing example.

Though, I'm wondering if you really want to do it this way. The "Y" members are not contiguous hence each Y would need to be copied to the device separately. You might want to consider coalescing Y on the host and then send it over in one contiguous block?

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group