PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Oddity in OpenACC
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
nickaj



Joined: 06 Sep 2011
Posts: 21

PostPosted: Fri Mar 30, 2012 3:02 am    Post subject: Oddity in OpenACC Reply with quote

I would be grateful if anyone has any clues to this, particularly whether it is an install problem or my poor OpenACC code.

Here's the OpenACC bit:
Code:

#pragma acc data copy(arrC)
#pragma acc kernels
  for(j=0;j<sz;j++){
    for (i=0;i<sz;i++){
      arrC[j][i] = arrA[j][i]*alpha + arrB[j][i];
    }
  }


arrA,B,C are all sz * sz arrays, where sz=10, so nothing huge.

The compiler generates what I'd expect
Code:

pgcc  -o basic basic.c -Minfo=accel,time  -acc -ta=nvidia
main:
     35, Generating copy(arrC[:][:])
     36, Generating copyin(arrA[:10][:10])
         Generating copyin(arrB[:10][:10])
         Generating compute capability 1.3 binary
         Generating compute capability 2.0 binary
     37, Loop is parallelizable
     38, Loop is parallelizable
         Accelerator kernel generated
         37, #pragma acc loop gang, vector(3) /* blockIdx.y threadIdx.y */
         38, #pragma acc loop gang, vector(10) /* blockIdx.x threadIdx.x */
             CC 1.3 : 13 registers; 116 shared, 12 constant, 0 local memory bytes; 25% occupancy
             CC 2.0 : 15 registers; 8 shared, 124 constant, 0 local memory bytes; 16% occupancy
  Timing stats:
    init                    50 millisecs    74%
    expand                  17 millisecs    25%
    Total time              67 millisecs


The vector sizes are quite short but this is a toy example so no problems there.
However, when I run the code I get:
Code:

./basic
call to EventSynchronize returned error 700: Launch failed
CUDA driver version: 4010

After a number of iterations, I do sometimes get it to run but the output array (arrC) has not changed (I set it to 0 before the accelerator region).

Interestingly, unsetting PGI_ACC_TIME changes this error to:
Code:

call to cuMemFree returned error 700: Launch failed
CUDA driver version: 4010


Some more iterations does eventually get it to run but still with a bad output.

-Nick.
Back to top
View user's profile
nickaj



Joined: 06 Sep 2011
Posts: 21

PostPosted: Fri Mar 30, 2012 3:34 am    Post subject: Reply with quote

I should have added to the previous I'm compiling with the 12.3 compiler.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Fri Mar 30, 2012 10:17 am    Post subject: Reply with quote

Hi Nickaj,

In OpenACC arrays are expected to be contiguous. So if arrC is a pointer to a pointer, this would cause your program to abnormally abort.

Can you post a full reproducing example which also includes how your arrays are declared?

Thanks,
Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Fri Mar 30, 2012 2:35 pm    Post subject: Reply with quote

Hi Nickaj,

FYI, we got another report of a similar issue and this one does look like a compiler error with the beta OpenACC. Though, do you mind still posting a bit more of your code so I can confirm that it's the same issue?

Thanks,
Mat
Back to top
View user's profile
nickaj



Joined: 06 Sep 2011
Posts: 21

PostPosted: Mon Apr 02, 2012 1:23 am    Post subject: Reply with quote

Here's my complete code:
Code:

#include<stdio>
#include<stdlib>
#include <openacc>


int main(int argc, char *argv[])
{

  int i = 0;
  int j = 0;
  int sz = 10;

  double arrA[sz][sz];
  double arrB[sz][sz];
  double arrC[sz][sz];
  for(j=0;j<sz;j++){
    for (i=0;i<sz;i++){
      arrA[j][i] = 1;
      arrB[j][i] = 2;
      arrC[j][i] = 0;
    }
  }

  double alpha = 0.5;

#pragma acc data copy(arrC[:10][:10])
#pragma acc kernels
  for(j=0;j<sz;j++){
    for (i=0;i<sz;i++){
      arrC[j][i] = arrA[j][i]*alpha + arrB[j][i];
    }
  }

  for(j=0;j<2;j++){
    for(i=0;i<10;i++){
      printf("arrC[%d][%d] = %lf\n", j, i, arrC[j][i]);
    }
  }

  return 0;
}
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group