PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

cuMemcpyDtoH error 99

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
amunozf



Joined: 15 Apr 2012
Posts: 1

PostPosted: Wed May 16, 2012 4:05 pm    Post subject: cuMemcpyDtoH error 99 Reply with quote

Hi!
I'm testing the OpenACC directives with the following function:

void update(int***restrict solid, float****restrict fp, float***restrict rho, float***restrict ux, float ***restrict uy, float***restrict uz, int nx, int ny, int nz, float mass, int time_total)
{
int x,y,z,t;
for (t=0; t<=time_total; t++)
{
mass = 0.0;

#pragma acc kernels loop present(solid[0:nx-1][0:ny-1][0:nz-1], fp[0:nx-1][0:ny-1][0:nz-1][0:18], rho[0:nx-1][0:ny-1][0:nz-1], ux[0:nx-1][0:ny-1][0:nz-1], uy[0:nx-1][0:ny-1][0:nz-1], uz[0:nx-1][0:ny-1][0:nz-1])

for (x=0; x<nx; x++)
{
for (y=0; y<ny; y++)
{
for (z=0; z<nz; z++)
{
if (!solid[x][y][z])
{
rho[x][y][z] = fp[x][y][z][0]+fp[x][y][z][1]+fp[x][y][z][2]+fp[x][y][z][3]+fp[x][y][z][4]+fp[x][y][z][5]+fp[x][y][z][6]+fp[x][y][z][7]+fp[x][y][z][8]+fp[x][y][z][9]+fp[x][y][z][10]+fp[x][y][z][11]+fp[x][y][z][12]+fp[x][y][z][13]+fp[x][y][z][14]+fp[x][y][z][15]+fp[x][y][z][16]+fp[x][y][z][17]+fp[x][y][z][18];

ux[x][y][z] = (fp[x][y][z][1]+fp[x][y][z][2]+fp[x][y][z][8]-fp[x][y][z][4]-fp[x][y][z][5]-fp[x][y][z][6]+fp[x][y][z][15]+fp[x][y][z][18]-fp[x][y][z][16]-fp[x][y][z][17])/rho[x][y][z];

uy[x][y][z] = (fp[x][y][z][2]+fp[x][y][z][3]+fp[x][y][z][4]-fp[x][y][z][6]-fp[x][y][z][7]-fp[x][y][z][8]+fp[x][y][z][9]+fp[x][y][z][14]-fp[x][y][z][11]-fp[x][y][z][12])/rho[x][y][z];

uz[x][y][z] = (fp[x][y][z][9]+fp[x][y][z][10]+fp[x][y][z][11]-fp[x][y][z][12]-fp[x][y][z][13]-fp[x][y][z][14]+fp[x][y][z][15]+fp[x][y][z][16]-fp[x][y][z][17]-fp[x][y][z][18])/rho[x][y][z];

mass += rho[x][y][z];
}
}
}
}

}
}


And the function is called from the main with:

#pragma acc data copy(s[0:nx-1][0:ny-1][0:nz-1], f[0:nx-1][0:ny-1][0:nz-1][0:18], r[0:nx-1][0:ny-1][0:nz-1], uxx[0:nx-1][0:ny-1][0:nz-1], uyy[0:nx-1][0:ny-1][0:nz-1], uzz[0:nx-1][0:ny-1][0:nz-1])
{
update(s, f, r, uxx, uyy, uzz, nx, ny, nz, mass, time_total);
}


The info messages and error I got are the following:

PGI$ pgcc -acc -Minfo=accel -fast lbm3Dacc4.c
NOTE: your trial license will expire in 10 days, 6.38 hours.
update:
268, Generating present(uz[0:nx-1][0:ny-1][0:nz-1])
Generating present(uy[0:nx-1][0:ny-1][0:nz-1])
Generating present(ux[0:nx-1][0:ny-1][0:nz-1])
Generating present(rho[0:nx-1][0:ny-1][0:nz-1])
Generating present(fp[0:nx-1][0:ny-1][0:nz-1][0:18])
Generating present(solid[0:nx-1][0:ny-1][0:nz-1])
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
269, Loop is parallelizable
271, Loop is parallelizable
273, Loop is parallelizable
Accelerator kernel generated
269, #pragma acc loop gang, vector(4) /* blockIdx.y threadIdx.z */
271, #pragma acc loop gang, vector(4) /* blockIdx.x threadIdx.y */
273, #pragma acc loop vector(16) /* threadIdx.x */
CC 1.0 : 24 registers; 1136 shared, 72 constant, 0 local memory byt
es; 33% occupancy
CC 2.0 : 32 registers; 1032 shared, 136 constant, 0 local memory by
tes; 66% occupancy
282, Sum reduction generated for mass
main:
368, Generating copy(uzz[0:nx-1][0:ny-1][0:nz-1])
Generating copy(uyy[0:nx-1][0:ny-1][0:nz-1])
Generating copy(uxx[0:nx-1][0:ny-1][0:nz-1])
Generating copy(r[0:nx-1][0:ny-1][0:nz-1])
Generating copy(f[0:nx-1][0:ny-1][0:nz-1][0:18])
Generating copy(s[0:nx-1][0:ny-1][0:nz-1])


PGI$ lbm3Dacc4.exe
Calculating...
call to cuMemcpyDtoH returned error 999: Unknown
CUDA driver version: 4000


I tried to follow strictly the program structure of the example file "acc_c3a.c" for OpenACC. I should make note that for sizing arrays I had used calloc instead of malloc.

I would appreciate any advice about this error. Regards!
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6206
Location: The Portland Group Inc.

PostPosted: Thu May 17, 2012 4:33 pm    Post subject: Reply with quote

Hi amunozf,

It looks like the compiler is functioning correctly, but there is some issue with your driver Our default is to use CUDA 4.0, so the generated code should work fine with a 4.0 driver, why it's not, I'm not sure.

What is the output from the command "pgaccelinfo"? Can you try updating your driver to the latest CUDA development version (http://developer.nvidia.com/cuda-downloads)?

Thanks,
Mat
Back to top
View user's profile
amunozflores



Joined: 03 Nov 2011
Posts: 1

PostPosted: Sat May 26, 2012 2:19 pm    Post subject: Still getting "cuMemcpyDtoH returned error 999: Unknown Reply with quote

Hello!

After upgrading drivers as suggested, we were'nt able to fix the cuMemcpyDtoH error. Below are the results we got,

PGI$ pgcc -acc lbm3Dacc4.c
NOTE: your trial license will expire in 0 days, 7.86 hours.
PGI$ lbm3Dacc4.exe
Calculating...
call to cuMemcpyDtoH returned error 999: Unknown
CUDA driver version: 4020
PGI$ pgaccelinfo
CUDA Driver Version: 4020

Device Number: 0
Device Name: GeForce GTX 580
Device Revision Number: 2.0
Global Memory Size: 1610285056
Number of Multiprocessors: 16
Number of Cores: 512
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 49152
Registers per Block: 32768
Warp Size: 32
Maximum Threads per Block: 1024
Maximum Block Dimensions: 1024, 1024, 64
Maximum Grid Dimensions: 65535 x 65535 x 65535
Maximum Memory Pitch: 2147483647B
Texture Alignment: 512B
Clock Rate: 1544 MHz
Execution Timeout: Yes
Integrated Device: No
Can Map Host Memory: Yes
Compute Mode: default
Concurrent Kernels: Yes
ECC Enabled: No
Memory Clock Rate: 2004 MHz
Memory Bus Width: 384 bits
L2 Cache Size: 786432 bytes
Max Threads Per SMP: 1536
Async Engines: 1
Unified Addressing: No
Current free memory: 1433202688
Upload time (4MB): 940 microseconds ( 620 ms pinned)
Download time: 940 microseconds ( 780 ms pinned)
Upload bandwidth: 4462 MB/sec (6765 MB/sec pinned)
Download bandwidth: 4462 MB/sec (5377 MB/sec pinned)

'Cause we ran out of time, we'll look to future releases of the acc engine.

Regards,

Antony
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6206
Location: The Portland Group Inc.

PostPosted: Tue May 29, 2012 4:07 pm    Post subject: Reply with quote

Hi Antony,

Having a complete example might be useful since I'm not sure how your arrays are declared or allocated. One major difference between the PGI Accelerator Model and OpenACC is that for PGI the copy clauses use the range to copy while in OpenACC it uses the starting element and the number of copy. So in your case, the error may be caused by using the "nx-1", "ny-1", "nz-1", assuming that your arrays have nx, ny, and nz elements. Also, OpenACC requires arrays to be contiguous and not arrays of pointers.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group