PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

prevent parallelization

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
wcj0626



Joined: 28 Feb 2012
Posts: 2

PostPosted: Mon Mar 19, 2012 8:21 pm    Post subject: prevent parallelization Reply with quote

I use openacc to test Matrix multiplication. code is :
void gputest(float *a,float *b,float *c)
{
int i,j,k;
#pragma acc data copy(c[:N*N]) copyin(a[:N*N],b[:N*N])
{
#pragma acc parallel for
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
for(k=0;k<N;k++)
{
c[i*N+j]+=a[i*N+k]*b[k*N+j];
}
}
}
}
I compile the code with "pgcc -o matric-2 matric-2.c -ta=nvidia,cc2.0 -Minfo ", the output is :
52, Generating copyin(b[:9999])
Generating copyin(a[:9999])
Generating copy(c[:9999])
54, Generating compute capability 2.0 binary
56, Loop carried dependence of '*(c)' prevents parallelization
Loop carried backward dependence of '*(c)' prevents vectorization
58, Loop is parallelizable
#pragma acc loop gang, vector(96) /* blockIdx.x threadIdx.x */
61, Complex loop carried dependence of '*(c)' prevents parallelization
Loop carried dependence of '*(c)' prevents parallelization
Loop carried backward dependence of '*(c)' prevents vectorization
Inner sequential loop scheduled on accelerator
#pragma acc loop seq(96)
Cached references to size [195] block of 'a'
CC 2.0 : 23 registers; 788 shared, 44 constant, 0 local memory bytes; 50% occupancy.

What should I do to sovle this problem?
Thanks!
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Tue Mar 20, 2012 9:32 am    Post subject: Reply with quote

Hi wcj0626,
Quote:
What should I do to sovle this problem?
There are two problems here. First, since you don't use the "restrict" keyword, the compiler must assume that the a, b, and c pointers could point at the same location in memory. This will prevent parallelization. To fix, either add the restrict keyword or use the flag "-Msafeptr".

The second issue is the use of a computed index. The compiler is not always able to determine when the index is computed, hence you need to add the "independent" clause.

Hope this helps,
Mat
Code:

$ cat test2.c
#define N 1024

void gputest(float * restrict a,float * restrict b,float * restrict c)
{
int i,j,k;
#pragma acc data copy(c[:N*N]) copyin(a[:N*N],b[:N*N])
{
#pragma acc region for independent
for(i=0;i<N;i++)
{
#pragma for independent
for(j=0;j<N;j++)
{
#pragma for independent
for(k=0;k<N;k++)
{
c[i*N+j]+=a[i*N+k]*b[k*N+j];
}
}
}
}
}
$ pgcc -ta=nvidia test2.c  -c -Minfo=accel
PGC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (test2.c: 8)
gputest:
      6, Generating copyin(b[:1048575])
         Generating copyin(a[:1048575])
         Generating copy(c[:1048575])
      9, Loop is parallelizable
     12, Loop is parallelizable
     14, Complex loop carried dependence of '*(c)' prevents parallelization
         Loop carried dependence of '*(c)' prevents parallelization
         Loop carried backward dependence of '*(c)' prevents vectorization
         Inner sequential loop scheduled on accelerator
         Accelerator kernel generated
          9, #pragma acc for parallel, vector(16) /* blockIdx.y threadIdx.y */
         12, #pragma acc for parallel, vector(16) /* blockIdx.x threadIdx.x */
         14, #pragma acc for seq(16)
             Cached references to size [16399] block of 'a'
             Cached references to size [16399] block of 'b'
PGC/x86-64 Linux 12.3-0: compilation completed with warnings
Back to top
View user's profile
wcj0626



Joined: 28 Feb 2012
Posts: 2

PostPosted: Tue Mar 20, 2012 7:08 pm    Post subject: Reply with quote

as I see ,after you compile the code ,there is still a "prevents parallelization" problem.
.........
6, Generating copyin(b[:1048575])
Generating copyin(a[:1048575])
Generating copy(c[:1048575])
9, Loop is parallelizable
12, Loop is parallelizable
14, Complex loop carried dependence of '*(c)' prevents parallelization
Loop carried dependence of '*(c)' prevents parallelization
Loop carried backward dependence of '*(c)' prevents vectorization
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
.......
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Thu Mar 22, 2012 10:26 am    Post subject: Reply with quote

Hi wcj0626,

My mistake in putting the "independent" clause on the innermost loop. The compiler is correct in making the inner loops sequential since the same element of C is being updated for each iteration. Only the outer two loops are parallelizable.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group