PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Privatization of array
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Viet



Joined: 12 Sep 2009
Posts: 8

PostPosted: Thu Oct 01, 2009 6:11 am    Post subject: Privatization of array Reply with quote

I ran a parallel loop as follows:

#define imax 257
#define jmax 129
#define kmax 129
#define nn 50

static float arr1[imax][jmax]kmax];
static float arr2[imax][jmax][kmax];

#pragma acc region
{
for(n=1;n<nn-1;++n){

for(i=1 ; i<imax-1 ; ++i){
for(j=1 ; j<jmax-1 ; ++j){
for(k=1 ; k<kmax-1 ; ++k){
arr1[i][j][k] = arr2[i][j][k];
}
}
}

for(i=1 ; i<imax-1 ; ++i){
for(j=1 ; j<jmax-1 ; ++j){
for(k=1 ; k<kmax-1 ; ++k){
arr2[i][j][k] = arr1[i][j][k] ;
}
}
}
} /* end n loop */

}


I got the following message when compiling:

Parallelization would require privatization of array arr1[i2+1][i3+1][1:kmax-2]

I carried out privatization of arrays as follows:


static float arr1[nn][imax][jmax]kmax];
static float arr2[nn][imax][jmax][kmax];

#pragma acc region
{
for(n=1;n<nn-1;++n){

for(i=1 ; i<imax-1 ; ++i){
for(j=1 ; j<jmax-1 ; ++j){
for(k=1 ; k<kmax-1 ; ++k){
arr1[n][i][j][k] = arr2[n][i][j][k];
}
}
}

for(i=1 ; i<imax-1 ; ++i){
for(j=1 ; j<jmax-1 ; ++j){
for(k=1 ; k<kmax-1 ; ++k){
arr2[n][i][j][k] = arr1[n][i][j][k] ;
}
}
}
} /* end n loop */

}

I got the out_of_memory error when running the code:

call to cuMemAlloc returned error 2: Out of memory

Are there different ways of privatization of array so as not to get the out_of_memory error?

Thanks in advance,
Viet
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 322
Location: Greenbelt, MD

PostPosted: Thu Oct 01, 2009 9:12 am    Post subject: Reply with quote

Viet,

Did you try privatizing the array using the pragma clauses? With Fortran it's an !$acc do clause, so I imagine it's a #pragma acc for one in C (I'm a Fortran programmer so caveat lector).

So you might try:
#pragma acc region
{
#pragma acc for private(arr1)
for(n=1;n<nn-1;++n){...

Essentially, you add that private clause on the line directly before the for-loop that it must apply to.
Back to top
View user's profile
Viet



Joined: 12 Sep 2009
Posts: 8

PostPosted: Fri Oct 02, 2009 3:36 am    Post subject: Reply with quote

Dear TheMatt,

Thank you very much. I was able to use pragma clause to privatize arrays automatically as follows:


#pragma acc region
{
#pragma acc for private(arr2[1:imax-2][1:jmax-2][1:kmax-2], arr1[1:imax-2][1:jmax-2][1:kmax-2])

for(n=1;n<nn-1;++n){

for(i=1 ; i<imax-1 ; ++i){
for(j=1 ; j<jmax-1 ; ++j){
for(k=1 ; k<kmax-1 ; ++k){
arr1[i][j][k] = arr2[i][j][k];
}
}
}

for(i=1 ; i<imax-1 ; ++i){
for(j=1 ; j<jmax-1 ; ++j){
for(k=1 ; k<kmax-1 ; ++k){
arr2[i][j][k] = arr1[i][j][k];
}
}
}

} /* end n loop */
}

However, out_of_memory error is still the problem. Are there any better ways of privatization of arrays in this case?

Thanks in advance,
Viet
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Fri Oct 02, 2009 11:46 am    Post subject: Reply with quote

Hi Viet,

When you privatize an array, you are creating a temporary copy for each thread. This can dramatically increase your memory usage. Also, since the private arrays are temporary, their values are not stored back to the host. Full details on the private clause can be found in section 2.4.4 of Accelerator model guide http://www.pgroup.com/resources/accel.htm.

Backing up to the original code, the reason why the outer loop wont parallelize is that all values of n (i.e. all threads) need to access the same i, j, and k elements of the arrays. Depending on the order in which the threads store their results, the values stored in the array will change and lead to non-deterministic results.

Instead of having the "n" loop be the outer loop, could it be moved to the innermost loop? This will allow you to parallelize the i, j, and k loops, have the n loop as kernel, reduce the data movement, and increase your compute intensity.

For example:
Code:

#pragma acc region
{
for(i=1 ; i<imax-1 ; ++i){
   for(j=1 ; j<jmax-1 ; ++j){
      for(k=1 ; k<kmax-1 ; ++k){
           for(n=1;n<nn-1;++n){
                arr1[i][j][k] = arr2[i][j][k];
        }
    }
}
...
}


Hope this helps,
Mat
Back to top
View user's profile
Viet



Joined: 12 Sep 2009
Posts: 8

PostPosted: Tue Oct 06, 2009 7:33 am    Post subject: Reply with quote

Dear Mat,

Thank you very much for the suggestion of moving the "n" loop to the innermost loop so as to parallelize the i, j, and k loops.

In order to do so, I have to parallelize the computation of arr1 and arr2 also, end take synchronization as shown in the following sketches:

Code:


for(i=1 ; i<imax-1 ; ++i){                                       
   for(j=1 ; j<jmax-1 ; ++j){                                         
      for(k=1 ; k<kmax-1 ; ++k){                                   
           for(n=1;n<nn-1;++n){                                               
                arr1[i][j][k] = arr2[i][j][k];                                             
                wait for computing arr2  at step (n)
           }                                         
        }                                                                                 
    }                                                                                   
}                                                                             


for(i=1 ; i<imax-1 ; ++i){                                       
   for(j=1 ; j<jmax-1 ; ++j){                                         
      for(k=1 ; k<kmax-1 ; ++k){                                   
           for(n=1;n<nn-1;++n){                                               
                arr2[i][j][k] = arr1[i][j][k];                                             
                wait for computing arr1  at step (n)
           }                             
        }                                                                                 
    }                                                                                   
}



How can I use pragma clause to implement this?

Thanks in advance,
Viet
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group