PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

Kernel with global scope variables

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
istvanreguly



Joined: 04 Apr 2013
Posts: 3

PostPosted: Fri Apr 11, 2014 7:41 am    Post subject: Kernel with global scope variables Reply with quote

Hello,

I would like to use global scope constants in my OpenACC region, somehow as it is outlined below, however I do not seem to be able to get "calculate" to inline (I do use -Minline). As I understand it may be necessary to use #pragma acc declare create (stride) but I am not quite sure where to put it, I tried putting a data region around the conditional and the loop, which did report that these creates were generated, but the function still did not inline. As soon as I removed any reference to stride, it did inline, but of course I got wrong results.
Unfortunately the way the codebase is laid out, there is no way to change it so that it would pass stride to calculate.
Could anyone help me with this?

Code:

int stride;
void calculate(const double *in, double *out) {
  out[0] = in[0]+in[0+stride];
}
void wrapper(...) {
  double *in=...;
  double *out=...;
  if (first_time) {
    stride = 15; //not compile time known
  }
  #pragma acc kernel deviceptr(in,out)
  for (int i = 0; i < 100; i++) {
    for (int j = 0; j < 100; j++) {
      calculate(in+i+j*stride,out+i+j*stride);
    }
  }
}


Thank you,
Istvan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6745
Location: The Portland Group Inc.

PostPosted: Fri Apr 11, 2014 8:04 am    Post subject: Reply with quote

Hi Istvan,

I tried to recreate your issue with the following code. There is an problem in that the compiler can't tell your pointers are not aliased so you need to add the "loop independent" clause, but no problem with inlining or global scoped variables. If I'm not recreating the problem, please post an example as well as the error you're seeing, the compiler version and which OS you're using.

Thanks,
Mat

Code:
% cat test.c
int stride;
static int first_time=1;
 void calculate(const double * in, double * out) {
   out[0] = in[0]+in[0+stride];
 }
 void wrapper(double * in, double * out) {
   int strd;
   if (first_time) {
     first_time=0;
     stride = 15; //not compile time known
   }
   #pragma acc kernels deviceptr(in,out)
   #pragma acc loop independent
   for (int i = 0; i < 100; i++) {
   #pragma acc loop independent
     for (int j = 0; j < 100; j++) {
       calculate(in+i+j*stride,out+i+j*stride);
     }
   }
 }
% pgcc -c test.c -Minline -Minfo -acc -V14.3
wrapper:
     12, Generating NVIDIA code
     14, Loop is parallelizable
     16, Loop is parallelizable
         Accelerator kernel generated
         14, #pragma acc loop gang /* blockIdx.y */
         16, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
     17, calculate inlined, size=4, file test.c (3)
Back to top
View user's profile
istvanreguly



Joined: 04 Apr 2013
Posts: 3

PostPosted: Fri May 02, 2014 1:59 am    Post subject: Reply with quote

Thank you, this seems to have done the trick for most loops. However, I am still having trouble with reductions:
Code:

#define MIN(a,b) (a<b)?(a):(b)
void ops_par_loop_calc_dt_kernel_min(int x_size, int y_size, double *p_a0, double *red) {
  double p_a1 = 1e21;
  #pragma acc parallel copyin(p_a0[0:x_size*y_size])
  #pragma acc loop reduction(min:p_a1)
  for ( int n_y=0; n_y<y_size; n_y++ ){
    #pragma acc loop reduction(min:p_a1)
    for ( int n_x=0; n_x<x_size; n_x++ ){
      p_a1 = MIN(p_a1,p_a0[n_x+n_y*x_size]);
    }
 }
 *red = p_a1;
}

when compiled with 14.2:
Code:

pgcpp -acc -ta=tesla:cc3x,keepgpu -O2 -Minline -Kieee -Minform=inform -Minfo=all calc_min.cpp -c -o calc_min.o

gives me the following report:
Code:

     16, Generating copyin(p_a0[0:x_size*y_size])
         Accelerator kernel generated
         19, #pragma acc loop gang /* blockIdx.x */
             Sum reduction generated for p_a1
         21, #pragma acc loop vector(256) /* threadIdx.x */
             Sum reduction generated for p_a1
     16, Generating NVIDIA code
     21, Loop is parallelizable

Why is a sum reduction being generated instead of a min reduction? I have looked into the generated calc_min.n001.gpu file, which, unless I am interpreting it wrong, does a threadblock-level sum reduction in shared memory. What am I doing wrong? I have tried using kernels instead of parallel and putting the reduction clause in all sorts of combinations but this still seems to come up.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6745
Location: The Portland Group Inc.

PostPosted: Fri May 02, 2014 10:12 am    Post subject: Reply with quote

It's C++ compiler error that's been fixed in 14.4.

Code:
% pgcpp -acc -c -Minfo=accel test.cpp -V14.2
ops_par_loop_calc_dt_kernel_min(int, int, double *, double *):
      3, Generating copyin(p_a0[0:y_size*x_size])
         Accelerator kernel generated
          6, #pragma acc loop gang /* blockIdx.x */
             Sum reduction generated for p_a1
          8, #pragma acc loop vector(256) /* threadIdx.x */
             Sum reduction generated for p_a1
      3, Generating NVIDIA code
      8, Loop is parallelizable

% pgcpp -acc -c -Minfo=accel test.cpp -V14.4
ops_par_loop_calc_dt_kernel_min(int, int, double *, double *):
      3, Generating copyin(p_a0[:y_size*x_size])
         Accelerator kernel generated
          6, #pragma acc loop gang /* blockIdx.x */
             Min reduction generated for p_a1
          8, #pragma acc loop vector(256) /* threadIdx.x */
             Min reduction generated for p_a1
      3, Generating Tesla code
      8, Loop is parallelizable


- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group