PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Nested loops in C

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
AlanGray



Joined: 27 Aug 2010
Posts: 1

PostPosted: Fri Aug 27, 2010 4:20 am    Post subject: Nested loops in C Reply with quote

I am trying to get the compiler to parallelize across 2 nested loops. This works as expected in fortran, but in C the compiler (pgcc v10.6) states that the inner loop is parallelizable, but does not parallelize it (only the outer loop). I'd be grateful for any advice on how to do this. The below simple example illustrates the problem.
Code:
Code:

    20   #pragma accel region
    21     {
    22   #pragma acc for parallel, vector(16)
    23       for (i = 0; i<N; i++)
    24         {
    25   #pragma acc for parallel, vector(16)
    26      for (j = 0; j<N; j++)
    27        {
    28          b[i][j] = 2.*a[i][j];
    29        }
    30         }
    31     }//end accel region


Compilation:
Code:

[agray3@fermi0 nested]$ pgcc -ta=nvidia:cc20 -Minfo:accel nested.c
main:
     20, Generating copyout(b[0:255][0:255])
         Generating copyin(a[0:255][0:255])
         Generating compute capability 2.0 binary
     23, Loop is parallelizable
         Accelerator kernel generated
         23, #pragma acc for parallel, vector(16)
             CC 2.0 : 8 registers; 4 shared, 48 constant, 0 local memory bytes; 16 occupancy
     26, Loop is parallelizable
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Fri Aug 27, 2010 2:20 pm    Post subject: Reply with quote

Hi Alan,

I'm not too sure why the inner loop is not being scheduled. I've sent an example on to one of our compiler engineers to see if it's a compiler issue or I'm missing something.

Thanks,
Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Thu Sep 09, 2010 11:02 am    Post subject: Reply with quote

Hi Alan,

I heard back from our compiler engineer. It turns out that this is a known issue that he was planning on addressing for the 11.0 release. However, since several users have recently reported the same issue, we bumped up the priority and were able to add the fix in this month's 10.9 release.

Thanks,
Mat

Code:
% cat test.c
int foo (int N, float ** b, float ** a) {

 int i, j;

#pragma accel region
 {
   for (i = 0; i<N; i++)
     {
  for (j = 0; j<N; j++)
    {
      b[i][j] = 2.*a[i][j];
   }
     }
 }//end accel region

 return 1;
}% pgcc -c test.c -ta=nvidia -Minfo=accel -fast -Msafeptr -Mfcon -V10.9
foo:
      5, Generating copyout(b[0:N-1][0:N-1])
         Generating copyin(a[0:N-1][0:N-1])
         Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
      7, Loop is parallelizable
      9, Loop is parallelizable
         Accelerator kernel generated
          7, #pragma acc for parallel, vector(16)
          9, #pragma acc for parallel, vector(16)
             CC 1.0 : 6 registers; 24 shared, 40 constant, 0 local memory bytes; 100 occupancy
             CC 1.3 : 6 registers; 24 shared, 40 constant, 0 local memory bytes; 100 occupancy

Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group