PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Question about the reduction clause in OpenACC

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
catfishwolf



Joined: 31 Mar 2013
Posts: 8

PostPosted: Sat Jul 27, 2013 3:42 pm    Post subject: Question about the reduction clause in OpenACC Reply with quote

Hi, Everyone,

I have a question when reading through a webpage about combing both OpenACC and OpenMP into one single program unit at the Dr.Dobb's website (http://www.drdobbs.com/parallel/the-openacc-execution-model/240006334?pgno=2). The code snippet of concerned is excerpted to show in the text below. Can anyone let me know why the reduction clause (i.e., reduction(+:tmp)) of the OpenACC pragma is missing from line 16, while the same reduction clause (for the same loop as line 16) remains invoked by OpenMP in line 15?

Thanks,
Li

Code:

1  void gramSchmidt(restrict float Q[][COLS], const int rows, const int cols)
2  {
3  #pragma acc data copy(Q[0:rows][0:cols])
4   for(int k=0; k < cols; k++) {
5      double tmp = 0.;
6  #pragma omp parallel for reduction(+:tmp)
7  #pragma acc parallel reduction(+:tmp)
8      for(int i=0; i < rows; i++) tmp +=  (Q[i][k] * Q[i][k]);
9      tmp = sqrt(tmp);
10     
11 #pragma omp parallel for
12 #pragma acc parallel loop
13    for(int i=0; i < rows; i++) Q[i][k] /= tmp;
14     
15 #pragma omp parallel for reduction(+:tmp)
16 #pragma acc parallel loop
17     for(int j=k+1; j < cols; j++) {
18       tmp=0.;
19       for(int i=0; i < rows; i++) tmp += Q[i][k] * Q[i][j];
20       for(int i=0; i < rows; i++) Q[i][j] -= tmp * Q[i][k];
21     }
22   }
23 }
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6206
Location: The Portland Group Inc.

PostPosted: Mon Jul 29, 2013 9:53 am    Post subject: Reply with quote

Hi Li,

To me, the question is not why it's missing from OpenACC but why it's included for OpenMP.

Only the outer loop is parallelized making the inner loops sequential. The OpenACC reduction clause is only needed when making parallel reductions since this requires extra code to set-up a partial reduction and then launch a second kernel to perform the final reduction.

I'll send a note to Rob and ask if he'll clarify his intent here.

Best Regards,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group