PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Invalid context error with OMP & GPU

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Kim AKeating12934



Joined: 14 Jan 2010
Posts: 7

PostPosted: Tue Oct 12, 2010 12:28 pm    Post subject: Invalid context error with OMP & GPU Reply with quote

I have the following code, which repeats a large number of calculations for every element in vector y and returns the results in vector z. The main program makes numerous calls to this subroutine, which compiles without error and appears to execute without a problem.
Code:

subroutine SingleGPU(nd, nx, ny, x, y, z)
use accel_lib
integer :: nd, nx, ny, i, j
real :: x(nd,nx), y(nd,ny), z(ny), v(nx), p(nx)
!$acc region do private(j, v, p)
do i = 1, ny
   p = 1.0
   do j = 1, nd
      v = y(j,i) - x(j,1:nx)
      p = p * ( .9375 * (1.0 - v**2)**2 * (abs(v) < 1.0) )
   end do
   z(i) = sum(p)
end do
!$acc end region
return
end subroutine SingleGPU

However, I have three C2050s and would like to use all of them. To spread the workload among multiple accelerators, I modified the code as follows.
Code:

subroutine MultiGPU(nd, nx, ny, x, y, z)
use accel_lib
use omp_lib
integer :: nd, nx, ny, i, ilo, ihi, j, ndevices
real :: x(nd,nx), y(nd,ny), z(ny), v(nx), p(nx)
ndevices = acc_get_num_devices(acc_device_nvidia)
!$omp parallel private(i, ilo, ihi, j, v, p, y, x) num_threads(ndevices)
call acc_set_device_num(omp_get_thread_num(), acc_device_nvidia)
ilo = omp_get_thread_num() * (ny/ndevices + 1) + 1
ihi = min(ny, ilo + (ny/ndevices) + 1) - 1)
!$acc region do private(j, v, p)
do i = ilo, ihi
   p = 1.0
   do j = 1, nd
      v = y(j,i) - x(j,1:nx)
      p = p * ( .9375 * (1.0 - v**2)**2 * (abs(v) < 1.0) )
   end do
   z(i) = sum(p)
end do
!$acc end region
!$omp end parallel
return
end subroutine MultiGPU

Within the accelerator region, the only difference between this and the first version of the code is the addition of the variables ilo and ihi to divide the workload among the available devices. I've checked omp_get_thread_num(), ilo, and ihi prior to entering the accelerator region. All are returning the expected values. This code compiles fine and appears to execute fine the first time it is called, but when called a second time it fails and returns the following message:

call to cuModuleGetFunction returned error 201: Invalid context
CUDA driver version: 3010

I'm at a loss. Can someone please help me understand what's going on here?
Back to top
View user's profile
brentl



Joined: 20 Jul 2004
Posts: 108

PostPosted: Wed Oct 13, 2010 2:06 pm    Post subject: Reply with quote

It might be a problem calling acc_set_device_num() more than once. Try putting that in a conditional so it only happens once.
Back to top
View user's profile
Kim AKeating12934



Joined: 14 Jan 2010
Posts: 7

PostPosted: Thu Oct 14, 2010 8:36 am    Post subject: Reply with quote

brentl wrote:
It might be a problem calling acc_set_device_num() more than once. Try putting that in a conditional so it only happens once.


Brent, thanks so much for your help. I assumed that acc_set_device_num() could be called anytime outside an accelerator region, so that processing could be redirected at any point and as often as needed. As you've suggested, however, that is not the case. I convinced myself of this by inserting a call to acc_shutdown() just before ending the omp thread. The program then runs without the error, but so slowly that I'd be better off confining all work to a single device. As currently written, the program can't work as intended if I place the call to acc_set_device_num() in a conditional, as you suggested. Guess I'll need to revise the flow of work in the main program.

Maybe it's only me, but this seems to be a real limitation. Can anyone from PGI comment on the chances of improving on this in future revisions?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Thu Oct 14, 2010 4:29 pm    Post subject: Reply with quote

Hi Kim,

I'll put in a feature request asking for a runtime function that checks if a device context has been created or not or possibly have acc_set_device be a no-op if the context is already set.

Thanks,
Mat
Back to top
View user's profile
Kim AKeating12934



Joined: 14 Jan 2010
Posts: 7

PostPosted: Fri Oct 15, 2010 8:31 am    Post subject: Reply with quote

mkcolg wrote:
Hi Kim,

I'll put in a feature request asking for a runtime function that checks if a device context has been created or not or possibly have acc_set_device be a no-op if the context is already set.

Thanks,
Mat


Thanks, Mat. That would be very helpful.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group