PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

How to parallelize this loop...
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
PaulPa



Joined: 02 Aug 2012
Posts: 35

PostPosted: Wed Sep 26, 2012 4:22 pm    Post subject: Reply with quote

Hi,

here is yet another loop I'm having problems with. The compiler keeps telling
me that it is has loop-carried dependencies despite the fact that I'm
using the restirct key word.

Code:

#pragma acc parallel loop gang private(i,j) present(X[0:n*n], XNext[0:n*n]) vector_length(32)
    for(i=1;i < n-1 ;++i){
        double max = 0.0;

#pragma acc loop vector
        for(j=1;j < n-1 ;++j){

            //average the four neighbouring vertices
            XNext[IDX(i,j,n)] = (X[IDX(i+1,j,n)] +  X[IDX(i-1,j,n)] + X[IDX(i,j+1,n)] + X[IDX(i,j-1,n)])/4.0;

            //find maximum difference between old and new solution
            max = fabs(XNext[IDX(i,j,n)] - X[IDX(i,j,n)]);
            if(delta < max)
                delta = max;
        }
    }

    return delta;
}


Here is the compiler feedback:

Code:

22, Accelerator kernel generated
         22, CC 2.0 : 21 registers; 0 shared, 100 constant, 0 local memory bytes
         23, #pragma acc loop gang /* blockIdx.x */
         27, #pragma acc loop vector(32) /* threadIdx.x */
         34, Max reduction generated for delta
     22, Generating present(XNext[0:n*n])
         Generating present(X[0:n*n])
         Generating compute capability 2.0 binary
     27, Complex loop carried dependence of '*(X)' prevents parallelization
         Complex loop carried dependence of '*(XNext)' prevents parallelization


It seems that this loop has been parallelized but why is there this dependence? If I compile with -Msafeptr these dependencies disapear and
the program runs a little faster (14,7 sec vs 15,9sec, both with acc_init()).

I'm very thankful for any advice.

Best,
Paul
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6070
Location: The Portland Group Inc.

PostPosted: Thu Sep 27, 2012 8:54 am    Post subject: Reply with quote

Hi Paul,

How are you using "restrict"? Often users will put it in wrong place changing it's meaning. For example, this is correct
Code:
float * restrict A;

But this is incorrect.
Code:
float restrict * A;


- Mat
Back to top
View user's profile
PaulPa



Joined: 02 Aug 2012
Posts: 35

PostPosted: Sun Sep 30, 2012 10:46 am    Post subject: Reply with quote

Hi Mat,

thank you for the clarification, it's working.

Best,
Paul
Back to top
View user's profile
PaulPa



Joined: 02 Aug 2012
Posts: 35

PostPosted: Mon Dec 17, 2012 10:39 am    Post subject: Reply with quote

Hi,

I recently compared the performance of the sparse matrix-vector multiplication (see page 1) with my NVIDIA CUDA implementation and it turned out that the
OpenACC version only performs at ~30% of the CUDA perfromance.

The OpenACC implementation follows the design of the CUDA version very closesly. Moreover, both versions launch the same number of threadblocks, threads and require the same amount of registers (i.e. 20 registers per thread).

Do you have a guess why there is such a big performance delta between the two implementations?

Best,
Paul
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6070
Location: The Portland Group Inc.

PostPosted: Mon Dec 17, 2012 11:39 am    Post subject: Reply with quote

Hi Paul,

What is the output from the profile information and how does it compare to the CUDA run? (i.e. set the environment variable PGI_ACC_TIME=1). There are three things to compare: data movement, initialization time, and kernel execution time. Which one are you comparing or are you looking at total time?

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group