PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

[OpenACC Fortran] Linear algebra in kernel loop
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
gaoyisheng



Joined: 05 Nov 2007
Posts: 6

PostPosted: Wed Jul 03, 2013 12:15 pm    Post subject: [OpenACC Fortran] Linear algebra in kernel loop Reply with quote

I was trying to perform linear algebraic algorithms in the OpenACC kernel loop, e.g., matrix inversion, (n=6, ntotal=1000)

Code:
real*8  a(n, n, ntotal), b(n,n), c(n)

!$acc region
!$acc loop kernel independent  private(b, c)
do i = 1, ntotal
...
get the inverse of matrix a(:, :, i) and store it in a(:, :, i),
where b and c are necessary auxiliary local arrays.
...
enddo
!$acc end region


However, when ntotal is large, e.g., ntotal > 200, I got the error message like below:

call to cuMemFree returned error 700: Launch failed

I think that the compiler specifies memory for the "private" b and c arrays like n*n*ntotal, and n*ntotal to make them private enough. But this account too much memory.

Is there anyone also working on matrix linear algebra that requires local matrices? I guess this should be a common issue if treated in a naive manner like I did. I am keen to know how to get over this issue.

Any comment is greatly welcome!
Back to top
View user's profile
AROM



Joined: 03 Apr 2013
Posts: 39

PostPosted: Thu Jul 04, 2013 2:47 am    Post subject: Reply with quote

Hi!

Do you really want to implement it with OpenACC?
Use cuBLAS library.

Alexey
Back to top
View user's profile
Malcolm Bibby



Joined: 16 Nov 2009
Posts: 33

PostPosted: Tue Jul 09, 2013 12:47 pm    Post subject: Reply with quote

Alexey. He said he is learning! Let him get an answer!!

Malcolm
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Tue Jul 09, 2013 4:48 pm    Post subject: Reply with quote

Hi e3lb89cz,

Quote:
I think that the compiler specifies memory for the "private" b and c arrays like n*n*ntotal, and n*ntotal to make them private enough.
Each thread will get there own copy of b and c, which may or may not be equal to ntotal.

Quote:
But this account too much memory
It's definitely possible that you can run out of memory by having too much private data, but I'm not convince that's what's happening here. At N=6, NTOTAL=200, and assuming the number of threads is 256, you're data usage is very small.

Quote:
call to cuMemFree returned error 700: Launch failed
This typically means that the kernel that was launched before this call to cuMemFree crashed for some reason. Why? I'd need a reproducing example to find out. Though, the first thing to do is make sure the host version is correct and that you're not hitting any out-of-bound errors (add -Mbounds flag to check) or other array access issues.

- Mat
Back to top
View user's profile
gaoyisheng



Joined: 05 Nov 2007
Posts: 6

PostPosted: Wed Jul 24, 2013 7:47 am    Post subject: Reply with quote

I have distilled a code with only memory pass. Please take a look and try to compile and run it. You may get the same error message.

Quote:
call to cuMemFree returned error 700: Launch failed


There should be no memory bounds error in the host version, as I checked. If you happen to know how it occurs, please let me know. I am keen to overcome this issue. Thanks a lot in advance!

Code:
program inversematrix

  implicit   real*8 (a-h,o-z)

  real*8  a(6,6,10000)
  real*8  c(6,6), L(6,6), U(6,6), b(6), d(6), x(6)

  niter = 10000
  n = 6

  a = 0.0d0
  do ie = 1, niter
  do i = 1, n
  a(i,i,ie) = 1.0d0
  enddo
  enddo

!$acc data region
!$acc region
!$acc loop kernel independent private(c,L,U,b,d,x)
  do ie = 1, niter

  c(:,:)=a(:,:,ie)
  L=c
  U=L
  b(:) = U(:,1)
  d=b
  x=d
  a(:,:,ie)=L(:,:)

  enddo
!$acc end region 
!$acc end data region

end program inversematrix






mkcolg wrote:
Hi e3lb89cz,

Quote:
I think that the compiler specifies memory for the "private" b and c arrays like n*n*ntotal, and n*ntotal to make them private enough.
Each thread will get there own copy of b and c, which may or may not be equal to ntotal.

Quote:
But this account too much memory
It's definitely possible that you can run out of memory by having too much private data, but I'm not convince that's what's happening here. At N=6, NTOTAL=200, and assuming the number of threads is 256, you're data usage is very small.

Quote:
call to cuMemFree returned error 700: Launch failed
This typically means that the kernel that was launched before this call to cuMemFree crashed for some reason. Why? I'd need a reproducing example to find out. Though, the first thing to do is make sure the host version is correct and that you're not hitting any out-of-bound errors (add -Mbounds flag to check) or other array access issues.

- Mat
Code:
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group