PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Error with #define statement for accessing array
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
kyle.niemeyer



Joined: 02 Oct 2008
Posts: 6

PostPosted: Mon Dec 10, 2012 2:46 pm    Post subject: Error with #define statement for accessing array Reply with quote

I seem to have run into a possible bug using a #define statement to map a 2D array call to a 1D array with OpenACC, e.g.:
Code:
#define pres_red(I, J) pres_red[((I) * ((NUM_2) + 2)) + (J)]


Using this in a number of places in my code, I found that certain locations (but not all, strangely) in device memory were not being accessed properly. Using this without OpenACC (just the CPU) works fine, and replacing these statements with the explicit form appears to work.

Is this a known issue?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Dec 10, 2012 3:03 pm    Post subject: Reply with quote

Hi Kyle,

What's the error message from "-Minfo=accel"? My guess the problem is that by using a computed index the compiler can tell that all iterations of the loop don't update the same element of the array. In these cases, you need to add the "independent" clause to the "loop" directive to tell the compiler the iterations are independent and thus can be parallelized.

- Mat
Back to top
View user's profile
kyle.niemeyer



Joined: 02 Oct 2008
Posts: 6

PostPosted: Mon Dec 10, 2012 3:12 pm    Post subject: Reply with quote

Hi Mat,

Unfortunately, I'm not getting an error message from the compiler — it compiles and runs without any apparent issue. The error is in the results.

I actually already have the "loop independent" clause; the function where I'm noticing the error looks like this:
Code:

#pragma acc kernels present(F[0:SIZE], G[0:SIZE], pres_red[0:SIZEP], pres_black[0:SIZEP])
  #pragma acc loop independent
    for (col = 1; col < NUM + 1; ++col) {
      #pragma acc loop independent
      for (row = 1; row < (NUM / 2) + 1; ++row) {

   int NUM_2 = NUM >> 1;
   
   Real p_ij = pres_black(col, row);

   Real p_im1j = pres_red(col - 1, row);
   Real p_ip1j = pres_red(col + 1, row);
   Real p_ijm1 = pres_red(col, row - ((col + 1) & 1));
   Real p_ijp1 = pres_red(col, row + (col & 1));
         
   // right-hand side
   Real rhs = (((F(col, (2 * row) - ((col + 1) & 1))
         - F(col - 1, (2 * row) - ((col + 1) & 1))) / dx)
         + ((G(col, (2 * row) - ((col + 1) & 1))
         - G(col, (2 * row) - ((col + 1) & 1) - 1)) / dy)) / dt;
   
   pres_black(col, row) = p_ij * (ONE - omega) + omega *
               (((p_ip1j + p_im1j) / (dx * dx))
               + ((p_ijp1 + p_ijm1) / (dy * dy)) - rhs)
               / ((TWO / (dx * dx)) + (TWO / (dy * dy)));
         
      }
    }


With my testing, I noticed that the "p_ip1j" variable wasn't accessing the correct location in the "pres_red" array (which was calculated in a previous function and looks similar to this one).

When I changed the pres_red accesses to the explicit 1D form (as in the #define statement), it seemed to correct the issue. Strangely, the calls to the "F" and "G" arrays, which use a similar #define, are fine.

My only guess is that the #define is acting weird because it relies on a local variable (NUM_2), whereas the #define statements for F and G do not.
Back to top
View user's profile
kyle.niemeyer



Joined: 02 Oct 2008
Posts: 6

PostPosted: Mon Dec 10, 2012 3:20 pm    Post subject: Reply with quote

It turns out I spoke too soon... After replacing all my pres_red and pres_black references to the explicit array access, the code isn't working again (CPU version is still fine, however).

If it tells you anything, this is what -Minfo=accel tells me:
450, Generating present(pres_black[0:24])
Generating present(pres_red[0:24])
Generating present(G[0:36])
Generating present(F[0:36])
Generating compute capability 2.0 binary
452, Loop is parallelizable
454, Loop is parallelizable
Accelerator kernel generated
452, #pragma acc loop gang, vector(8) /* blockIdx.y threadIdx.y */
454, #pragma acc loop gang, vector(32) /* blockIdx.x threadIdx.x */
Cached references to size [(x)] block of 'pres_red'
CC 2.0 : 40 registers; 0 shared, 88 constant, 0 local memory bytes
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Dec 10, 2012 4:14 pm    Post subject: Reply with quote

Hi kyle,

Can you please send a reproducing example to PGI Customer Service (trs@pgroup.com) and ask them to forward it to me? While I highly doubt the problem is with the macro, I'm not sure what the issue is and it will take some digging to figure it out.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group