PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Oddity in OpenACC
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Mon Apr 02, 2012 10:49 am    Post subject: Reply with quote

Yep, same problem. The compiler is tripping over the use of VLAs. Changing them to fixed size or malloc'd arrays will work around the problem.

Since this just came in last Friday, I'm not sure we'll get it fixed by the 12.4 release, but we'll try.

- Mat

Code:
% cat test.c

#include<stdio>
#include<stdlib>
#include<openacc>

#define N 10

int main(int argc, char *argv[])
{

  int i = 0;
  int j = 0;
  int sz = N;

  double arrA[N][N];
  double arrB[N][N];
  double arrC[N][N];
  for(j=0;j<sz;j++){
    for (i=0;i<sz;i++){
      arrA[j][i] = 1;
      arrB[j][i] = 2;
      arrC[j][i] = 0;
    }
  }

  double alpha = 0.5;

#pragma acc data copy(arrC[:10][:10])
#pragma acc kernels
  for(j=0;j<sz;j++){
    for (i=0;i<sz;i++){
      arrC[j][i] = arrA[j][i]*alpha + arrB[j][i];
    }
  }

  for(j=0;j<2;j++){
    for(i=0;i<10;i++){
      printf("arrC[%d][%d] = %lf\n", j, i, arrC[j][i]);
    }
  }

  return 0;
}
% pgcc -acc test.c -V12.3
% a.out
arrC[0][0] = 2.500000
arrC[0][1] = 2.500000
arrC[0][2] = 2.500000
arrC[0][3] = 2.500000
arrC[0][4] = 2.500000
arrC[0][5] = 2.500000
arrC[0][6] = 2.500000
arrC[0][7] = 2.500000
arrC[0][8] = 2.500000
arrC[0][9] = 2.500000
arrC[1][0] = 2.500000
arrC[1][1] = 2.500000
arrC[1][2] = 2.500000
arrC[1][3] = 2.500000
arrC[1][4] = 2.500000
arrC[1][5] = 2.500000
arrC[1][6] = 2.500000
arrC[1][7] = 2.500000
arrC[1][8] = 2.500000
Back to top
View user's profile
Maxim Milakov



Joined: 12 Apr 2012
Posts: 4

PostPosted: Thu Apr 12, 2012 7:20 am    Post subject: Reply with quote

It seems I encountered the same problem: I got the following error:

call to cuMemFree returned error 700: Launch failed

It occurs when I use "async(1)" clause (even if I put "!$acc wait(1)" directive just after every parallel clause). And yes, I use arrays with ALLOCATABLE attribute.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Thu Apr 12, 2012 10:29 am    Post subject: Reply with quote

Hi Maxim,

nickaj's problem has to do with C99 VLAs so your issue is most likely unrelated. A error 700 "Launched failed" is a generic error meaning that your kernel failed for some reason. Can you please give more detail about the error as well as a reproducing example?

Thanks,
Mat
Back to top
View user's profile
Maxim Milakov



Joined: 12 Apr 2012
Posts: 4

PostPosted: Mon Apr 16, 2012 2:03 am    Post subject: Reply with quote

Hi Mat,

I am still trying to nail down the issue to be able to make a small reproduction code.

I managed to catch the error at some earlier stage:

Consider the pseudo code:

Code:

!$acc data ...clauses go here...
DO i = 1, n
!$acc parallel
...several parallelizable cycles go here...
!$acc end parallel
DO j = 1, n
!$acc parallel
...other parallelizable cycles go here, each iteration j+1 depends on previous one j...
!$acc end parallel
END DO
END DO
!$acc end data


This code works. The problem is that it executes a lot of paralel regions synchronously, waiting at the HOST for each region (kernels) to complete and only then scheduling another one. It is slow. It would be highly desirable to push all the kernels to the GPU and wait for the final one to complete.

My very first step was to mark the first parallel region (in the outer loop) with "async" clause and then, just after the region, I put "!$acc wait" directive. I perfectly understand that we cannot expect such a code to run any faster than the original one, but I did it just to check whether async is supported. Here is the new code:

Code:

!$acc data ...clauses go here...
DO i = 1, n
!$acc parallel async
...several parallelizable cycles go here...
!$acc end parallel
!$acc wait
DO j = 1, n
!$acc parallel
...other parallelizable cycles go here, each iteration j+1 depends on previous one j...
!$acc end parallel
END DO
END DO
!$acc end data


It always fails with error "call to cuMemcpyHtoD returned error 1: Invalid value". I traced the code: The error ooccurs at random iterations of outer and inner cycles. And thus I wonder whether async/wait is working or not...

P.S. There are no data memory transfers which would correspond to error encountered. I attribute this error to the transfer of some data required to launch the kernel (parameters e t.c.)
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Mon Apr 16, 2012 10:52 am    Post subject: Reply with quote

Quote:
I traced the code: The error ooccurs at random iterations of outer and inner cycles. And thus I wonder whether async/wait is working or not...
Async support is very new so it's possible that there are problem. Though, without an example it's very difficult to tell what's wrong. Note, if the code is too long or you don't want it posted on a public forum, please send it to PGI customer service (trs@pgroup.com) and ask them to forward it to me.

Quote:
!$acc parallel
...other parallelizable cycles go here, each iteration j+1 depends on previous one j...
!$acc end parallel
If there is a backwards dependency, the code is not parallel. What does the -Minfo messages tell you about this loop?

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group