PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Internal compiler error for simple OpenACC parallel loop?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
David Henty



Joined: 16 May 2012
Posts: 2

PostPosted: Wed May 16, 2012 9:18 am    Post subject: Internal compiler error for simple OpenACC parallel loop? Reply with quote

I am trying to use OpenACC to parallelise a very simple image processing code that I use as an example on a number of parallel programming courses.

Using "pgcc -acc -Minfo=accel -c dosharpen.c" I get a whle bunch of errors like:

/tmp/pgaccj63cB3vdLOXa.gpu(104): error: expression must have arithmetic or enum type
...
ending in:

19 errors detected in the compilation of "/tmp/pgnvd173cFGXA-4Gj.nv0".

The loop isn't parallelised although the compiler analysis is exactly what I would expect:

97, Generating copyin(scale)
Generating copyin(norm)
Generating copyin(sigmad4)
Generating copyin(filter0)
Generating copyin(d4)
Generating copyin(d)
Generating copyin(sharp[0:][0:])
Generating copy(fuzzy[0:][0:])
105, Loop is parallelizable
107, Loop is parallelizable
109, Loop carried dependence of 'sharp' prevents parallelization
Loop carried backward dependence of 'sharp' prevents vectorization
111, Complex loop carried dependence of 'sharp' prevents parallelization
Loop carried dependence of 'sharp' prevents parallelization
Loop carried backward dependence of 'sharp' prevents vectorization
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
105, #pragma acc loop gang, vector(16) /* blockIdx.y threadIdx.y */
107, #pragma acc loop gang, vector(16) /* blockIdx.x threadIdx.x */
109, #pragma acc loop seq
111, #pragma acc loop seq

Any ideas what's going on - - I get the same errors if I use "parallel" rather than "kernels"?

The loop is appended. Note that the code is quite verbose and inefficient but this is because it is meant to be a training example for beginners.

Thanks,

David
------------------------------------------
#pragma acc data copy(fuzzy) copyin(sharp, d, d4, filter0, sigmad4, norm, scale)
{

#pragma acc kernels loop private(i, j, k, l, rd4sq, rsq, sigmad4sq, sigmasq, \
rsq, delta, filter, filter0)
{
for (i=0; i < nx; i++)
{
for (j=0; j < ny; j++)
{
for (k=-d; k <= d; k++)
{
for (l= -d; l <= d; l++)
{
rd4sq = d4*d4;
rsq = d*d;

sigmad4sq = sigmad4*sigmad4;
sigmasq = sigmad4sq * (rsq/rd4sq);

rsq = 1.0*i*i + 1.0*j*j;
delta = rsq/(2.0*sigmasq);

filter = filter0 * (1.0-delta) * exp(-delta);

sharp[i][j] = sharp[i][j]
+ filter*fuzzy[i+d+k][j+d+l];
}
}
}
}
}
}
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6119
Location: The Portland Group Inc.

PostPosted: Thu May 17, 2012 4:41 pm    Post subject: Reply with quote

Hi David.

The "expression must have arithmetic or enum type" error is a known issue in the OpenACC beta (TPR#18694). It was just reported last week so missed the deadline for 12.5 but hopefully we can have a fix in place for 12.6.

As for the "Loop carried dependence" messages. The compiler is correct and the two inner loops are not parallelizable due to the loop dependency on "sharp".

Since sharp is being used as a sum reduction, to parallelize the inner loops, you need to use the "reduction" clause. However since "reduction" only works with scalars, you need to create a temp scalar value to sum the intermediate value and then store the result back into sharp. Something along the lines of:

Code:
#pragma acc data copy(fuzzy) copyin(sharp, d, d4, filter0, sigmad4, norm, scale)
{

#pragma acc kernels loop private(i, j, k, l, rd4sq, rsq, sigmad4sq, sigmasq, \
rsq, delta, filter, filter0)
{
for (i=0; i < nx; i++)
{
for (j=0; j < ny; j++)
{
#pragma acc loop reduction(+:sum)
for (k=-d; k <= d; k++)
{
for (l= -d; l <= d; l++)
{
rd4sq = d4*d4;
rsq = d*d;

sigmad4sq = sigmad4*sigmad4;
sigmasq = sigmad4sq * (rsq/rd4sq);

rsq = 1.0*i*i + 1.0*j*j;
delta = rsq/(2.0*sigmasq);

filter = filter0 * (1.0-delta) * exp(-delta);

sum = sum + filter*fuzzy[i+d+k][j+d+l];
}
}
sharp[i][j] += sum;
}
}
}
}

The caveat is that inner loop reductions are not quite working as well as we'd like. Currently, you'll gets some dependency messages on the reduction variable and you'll need to explicitly set the loop schedule on the outer loops.

- Mat
Back to top
View user's profile
David Henty



Joined: 16 May 2012
Posts: 2

PostPosted: Tue May 22, 2012 8:01 am    Post subject: Re:Internal compiler error for simple OpenACC parallel loop? Reply with quote

Matt,

Thanks for the reply - I'll wait until 12.6 before taking this example forward!

I should have made it clearer that I already understood the issue with the inner loops, which I'm actually quite happy to execute sequentially. However, thanks for the tip re introducing a scalar reduction variable if parallelisation is required/

David
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6119
Location: The Portland Group Inc.

PostPosted: Tue May 22, 2012 9:21 am    Post subject: Reply with quote

Hi David,

Note that in the mean time, you can use the PGI Accelerator Model instead of OpenACC since the "kernels" model is based from it.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group