PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

OpenMP + CUDA Fortran issue

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
bdkaplin



Joined: 17 Aug 2010
Posts: 3

PostPosted: Sat Feb 05, 2011 8:11 pm    Post subject: OpenMP + CUDA Fortran issue Reply with quote

I noticed while implementing CUDA Fortran + OpenMP that there are distinct issues with declaring device variables private in the parallel section. If I did not declare a device variable private, each CPU thread would correctly assign itself a device, but it would throw a copy error whenever the second thread tried to access the device variable. When I did declare these variables private, the CPUs would not correctly assign themselves devices. The error number was 36 "Active Process Error". I noticed the main difference from Mat's post in http://www.pgroup.com/userforum/viewtopic.php?t=2350 was that his variables were declared allocatable, while mine were defined with fixed sizes.

Setting all device variables (including scalars!) to allocatable, declaring them as private, then allocating them within the parallel construct resolved both the Copy Access Error and the Active Process Error. Is there an easier way around these issues? It seems that declaring an unallocatable device variable as private in a parallel construct counts as initial access to a device and locks all CPU threads into a single card, no matter if data was transferred before calling cudasetdevice().

If there is a better way to approach this, please let me know. I can't put everything into a subroutine, because the GPUs need to swap data at each timestep. I also cannot use MPI for this portion because it is a subsection of a much larger code.

Brian
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon Feb 07, 2011 5:45 pm    Post subject: Reply with quote

Hi Brian,

A device context is created upon first use of a device. So if you have a static device variable declared at the start of your routine, the context is created upon entry since space needs to be allocated on the device to hold these variables. As you discovered, this doesn't work if you later enter an OpenMP region since multiple threads can't share a context.

What you need to do is delay the creation of the device variables until after the context is created by each OpenMP thread. As you note, this can be done by making private copies of the variables and then have each thread allocate them after the context is created. A second would be to move your device code into a subroutine that gets called by each OpenMP thread.

Quote:

I can't put everything into a subroutine, because the GPUs need to swap data at each timestep
Can you pass in your shared host variables? Granted, I don't know your code, but it seems like you should still be able to swap host data around even if it's from a subroutine.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group