PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

PGI accelerator model with nested loops

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
tannguyen



Joined: 26 Jul 2010
Posts: 11

PostPosted: Wed Sep 08, 2010 11:41 pm    Post subject: PGI accelerator model with nested loops Reply with quote

Hi,

I am trying to use the PGI model for my 3-D Jacobi application which has 3 nested loops inside. These loops are rectangle and have no loop dependency. I did use the switch -Msafeptr for processing pointers. However, it seems that the pgcc compiler just parallelized the out-most loop.

#pragma acc data region copy(U[0:N+1][0:N+1][0:N+1]) copyin(Un[0:N+1][0:N+1][0:N+1]) copyin(b[0:N-1][0:N-1][0:N-1]) local(tmp[0:N+1][0:N+1][0:N+1])
{
for (int it= 1; it<=nIters; it++) {
#pragma acc region
{
for (k=1; k<N+1; k++)
for (j=1; j<N+1; j++)
for (i=1; i<N+1; i++)
Un[i][j][k] = c * (U[i-1][j][k] + U[i+1][j][k] + U[i][j-1][k] + U[i][j+1][k] + U[i][j][k-1] + U[i][j][k+1] - c2*b[i-1][j-1][k-1]);
}

tmp = U;
U = Un;
Un = tmp;
}
}

Here is the message from the compiler:

146, Generating local(tmp[:N+1][:N+1][:N+1])
Generating copyin(b[:N-1][:N-1][:N-1])
Generating copyin(Un[:N+1][:N+1][:N+1])
Generating copy(U[:N+1][:N+1][:N+1])
155, Loop is parallelizable
Accelerator kernel generated
155, #pragma acc for parallel, vector(256)
156, Loop is parallelizable
157, Loop is parallelizable

Is this because of the current restriction of the PGI model?
Back to top
View user's profile
tannguyen



Joined: 26 Jul 2010
Posts: 11

PostPosted: Wed Sep 08, 2010 11:54 pm    Post subject: Reply with quote

I also used loop directives to instruct the compiler to map loop parallelism to GPU parallelism but it didn't help:

#pragma acc region
{
#pragma acc for parallel vector(8) <== map to blocks
for (j=1; j<N+1; j++){
#pragma acc for seq unroll(4) <== sequential
for (k=1; k<N+1; k++)
{
#pragma acc for vector(8) <== map to threads
for (i=1; i<N+1; i++){
Un[i][j][k] = c * (U[i-1][j][k] + U[i+1][j][k] + U[i][j-1][k] + U[i][j+1][k] + U[i][j][k-1] + U[i][j][k+1] - c2*b[i][j][k]);
}
}

}
}

The message from compiler is
157, Loop is parallelizable
Accelerator kernel generated
157, #pragma acc for parallel, vector(8)
159, Loop is parallelizable
162, Loop is parallelizable

Tan.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Thu Sep 09, 2010 10:56 am    Post subject: Reply with quote

Hi Tan,

This is a known issue having to do with how the compiler was treating the outer loops index variable. The good news is that this issue will be fixed in this month's 10.9 release.

Thanks,
Mat
Back to top
View user's profile
tannguyen



Joined: 26 Jul 2010
Posts: 11

PostPosted: Thu Sep 09, 2010 11:17 am    Post subject: Reply with quote

Thanks Mat, I can't wait to see the new release :).

Tan.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group