PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

Array of structures vs structure of arrays

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
KarlWilkinson85254



Joined: 17 Jan 2013
Posts: 9

PostPosted: Mon Nov 23, 2015 6:30 am    Post subject: Array of structures vs structure of arrays Reply with quote

Hi,

Sorry if this is a repeat of a previous post, I recall a discussion along these lines but couldn't find the post.

The package I work on has recently undergone a significant rewrite in order to use derived types and I am merging those changes with the GPU implementation of the code.

Unfortunately I have had to hack my way around a host based "array of structures" by copying to a temporary array*. Other situations, where the derived type is used directly, are copied without issue - it is just this array of structures that is a problem.

I was hoping you could give me some brief feedback/links to further information regarding expected issues and best practises for derived types and OpenACC/CUDA Fortran.

Cheers,

Karl


*Just to clarify, only a subarray of the derived type is passed to the GPU, not the entire structure. However, this is then passed to the routine that does the host>device copy as follows:

Code:
subroutine routineA
   call routineB(derivedtype%array)
end subroutine routineA

subroutine routineb(array)
   istat = cudaMemcopyAsync(array, etc...)
end subroutine routineb


Debugging the code that uses derived types in this manner results in a memcopy error that suggests an issue with the host array as the memory location is 0x0.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6616
Location: The Portland Group Inc.

PostPosted: Mon Nov 23, 2015 1:27 pm    Post subject: Reply with quote

Hi Karl,

There really isn't a best practices guide for derived types as of yet. Greg Ruetsch has a section on it in the second edition of his CUDA Fortran book, but that still being written so not available publicly.

For OpenACC, if your derived type contains dynamic data members (which I assume is the case here), then the standard isn't quite ready. It's one of the major items for the next OpenACC standard, but for now it's a bit piece meal depending upon the compiler you're using.

For both cases, you might want to try using CUDA Unified Memory. It's only available for dynamic memory and you're limited on the amount of memory your program can use, but works well in these types cases and greatly simplifies your programming effort.

CUDA Fortran: https://www.pgroup.com/lit/articles/insider/v6n1a2.htm
OpenACC: https://www.pgroup.com/lit/articles/insider/v6n2a4.htm

For your example CUDA Fortran code, make sure "array" has the "device" attribute and that "routineb" includes an interface with the "array" argument having "device" as well.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group