PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

OpenACC and OpenGL pointers

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mmikalsen



Joined: 30 Aug 2012
Posts: 3

PostPosted: Thu Mar 21, 2013 11:11 am    Post subject: OpenACC and OpenGL pointers Reply with quote

Hi

I'm currently working on a project involving fluid simulation computed using OpenACC and visualized with OpenGL (Flowing fluid is displayed in 2D space as density). All calculations are performed on data arrays allocated on device memory using the acc_malloc runtime function to minimize host to device data transfers.

To render the fluid i'm using a pixel buffer object (PBO) mapped to a pointer. This pointer is sent to a a OpenACC kernel region where data from one of the acc_malloc data arrays are copied to the pixel buffer object which effectively results in a device to device copy operation.

All this worked fine when using PGI OpenACC compiler 12.8, but when changing to version 13.3 i get the error "call to cuMemHostRegister returned error 999: Unknown" when trying to pass the OpenGL PBO pointer to the OpenACC region. When i use a host allocated pointer instead it works fine, but this means i have to perform a device to host copy and then a host to device copy instead of just a device to device copy.

Any advice/explanation would be greatly appreciated!

The OpenGL code:
Code:

   float4 *output;
   output = (float4*)glMapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB, GL_WRITE_ONLY_ARB);

   if (output) {
        // get results from fluid simulation
       renderFluid(&config, output); // runs the densitytocolor kernel
       glUnmapBufferARB(GL_PIXEL_UNPACK_BUFFER_ARB);
   }



The kernel code:
Code:

void densityToColor(float4 *output, const float *density, int N) {
#pragma acc kernels deviceptr(density) copyout(output[0:(N+2)*(N+2)])
{
   #pragma acc loop independent
   for (int j=1; j<=N; j++) {
      #pragma acc loop independent
      for (int i=1; i<=N; i++) {
         // Just copy data from density to output right now
         // Might do some color manipulation later
         float densityValue = density[IX(i,j)];
         float4 f4ptr;
         f4ptr.x = densityValue; f4ptr.y = densityValue; f4ptr.z = densityValue; f4ptr.w = densityValue;
         output[IX(i,j)] = f4ptr;
      }
   }
}
}



Here is also some of the debug output when using PGI_ACC_DEBUG=1:
Code:


pgi_uacc_begin( compute region, file=/home/mmikalsen/pgicode/fluid2dacc/fluid2d.c, function=densityToColor, lines=49:64, startline=50, endline=64, devid=0, threadid=1 )
pgi_uacc_begin( file=/home/mmikalsen/pgicode/fluid2dacc/fluid2d.c, function=densityToColor, lines=49:64, startline=50, endline=64, devid=1, threadid=1 ) dindex=1
pgi_uacc_dataon( devid=1, threadid=1 )
pgi_uacc_dataon(devptr=0x13c7010,hostptr=0x2aaaae989e00,offset=0,stride=1,size=66564,extent=-1,eltsize=16,lineno=50,name=output,flags=0xb00=create+present+copyout,threadid=1)
pgi_uacc_dataon( devid=1, threadid=1 ) dindex=1
NO map for host:0x2aaaae989e00
pgi_uacc_alloc(size=1065024,devid=1,threadid=1)
pgi_uacc_alloc(size=1065024,devid=1,threadid=1) returns 0x600500000
map    dev:0x600500000 host:0x2aaaae989e00 size:1065024 offset:0  data[dev:0x600500000 host:0x2aaaae989e00 size:1065024] (line:50 name:output)
alloc done with devptr at 0x600500000
pgi_uacc_pin(devptr=0x0,hostptr=0x2aaaae989e00,offset=0,stride=1,size=66564,extent=-1,eltsize=16,lineno=50,name=output,flags=0x0,threadid=1)
MemHostRegister( 0x2aaaae989e00, 1065024, 0 )
call to cuMemHostRegister returned error 999: Unknown
Starting simulation
make: *** [run] Error 1

Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Fri Mar 22, 2013 12:29 pm    Post subject: Reply with quote

Hi mmikalsen,

In the 13.1 compilers, the run time started using pinned memory by default with goal of improving asynchronous data copies and the overall performance of data copies. However, I suspect that given OpenGL memory is mapped separately from other CUDA memory, this change is interfering with your usage.

We're looking at adding a flag and/or environment variable which will revert to the old behaviour of using 1MB buffers instead of pinned memory to transfer data. Hopefully this change will allow to continue to use the OpenGL mapped memory.

Would you be able to share your code with us? OpenGL memory mapped data isn't something that we directly support and therefore don't test. If possible, I'd like to add your code to our testing system so that this wont occur again.

Thanks,
Mat
Back to top
View user's profile
mmikalsen



Joined: 30 Aug 2012
Posts: 3

PostPosted: Tue Mar 26, 2013 12:11 pm    Post subject: Reply with quote

Thanks for the reply Mat.

The code generates a fluid flow that looks like smoke being emitted from the center of the simulation grid. It's based on GPU Gems 38. - Fast Fluid Dynamics Simulation on the GPU.

The code can be found in the following github repository:

https://github.com/magnuami/fluid2doacc
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Tue Mar 26, 2013 2:09 pm    Post subject: Reply with quote

Thanks mmikalsen,

I'm short on time today but will take a look later in the week.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group