PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Launch fails for sparse matrix-vector multiplication
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
rztim



Joined: 12 Jul 2010
Posts: 12

PostPosted: Mon Jul 12, 2010 7:50 am    Post subject: Launch fails for sparse matrix-vector multiplication Reply with quote

Hello,

I tried to do parallize a sparse matrix vector multiplication (CRS) with the PGI Accelerator. For a very small testcases (12x12) the kernel is launched sometimes (!), but not always. For bigger testcases it never works. In both cases the error message is:

call to cuMemcpyDtoH returned error 700: Launch failed

Code:


typedef double floatType;

struct MatrixCRS{
  int n;
  int nnz;
  int* ptr;
  int* index;
  floatType* value;
};

// y <- A*x
void matvec(const struct MatrixCRS restrict* A, const floatType restrict* x, floatType restrict* y){
  int i,j;
  floatType restrict* value=A->value;
  int restrict* index = A->index;
  int restrict* ptr=A->ptr;
  const int nnz=A->nnz;
  const int n=A->n;
  #pragma acc region copyin(value[0:nnz-1], ptr[0:n], index[0:n], x[0:n-1]), copyout(y[0:n-1])
  {
  for(i=0; i<A->n; i++){
    y[i]=0;
    for(j=ptr[i]; j<ptr[i+1]; j++){
      y[i]+=value[j]*x[index[j]];
    }
  }
  }//parallel region
}



Im not sure if i did the copy to the device correct. The length of the fields are:
value: nnz
ptr: n+1
index: n+1
x: n
y: n

Is there a mistake in my code or is this an issue of the pgi compiler. I tried with 10.5 and 10.6 and a Tesla T10 Processor. The feedback looks like this:

Code:

$ pgcc -fastsse -DDEBUG -ta=nvidia,3.0,cc13 -Minfo -g -c solver.c
matvec:
     40, Generating copyin(ptr[:n])
         Generating copyout(y[:n-1])
         Generating copyin(value[:nnz-1])
         Generating copyin(x[:n-1])
         Generating copyin(index[:n])
         Generating compute capability 1.3 binary
     43, Loop is parallelizable
         Accelerator kernel generated
         43, #pragma acc for parallel, vector(256)
             Cached references to size [257] block of 'ptr'
             Using register for 'y'
             CC 1.3 : 19 registers; 1052 shared, 92 constant, 0 local memory bytes; 75 occupancy
     45, Complex loop carried dependence of 'y' prevents parallelization
         Loop carried reuse of 'y' prevents parallelization
         Inner sequential loop scheduled on accelerator


Thanks for your help.

Kind regards,
Tim
Back to top
View user's profile
rztim



Joined: 12 Jul 2010
Posts: 12

PostPosted: Fri Jul 16, 2010 12:43 am    Post subject: Reply with quote

Nobody knows why this happens?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Fri Jul 16, 2010 8:28 am    Post subject: Reply with quote

Hi Tim,

Sorry I missed your post earlier.

I don't see anything obvious. The error "call to cuMemcpyDtoH returned error 700: Launch failed" typically means that your kernel crashed during execution. So my best guess is that there is a memory access violation. For example if one of ptr's "i" values is greater than the size of value or index. I've seen several programs that have these array bounds problems that 'work' on a CPU but crash on a GPU. Try compiling the CPU code with bounds checking enabled (-Mbounds) to see if anything shows up.

If this isn't it, can you please post a driver program that I can use to recreate the problem?

Thanks,
<at
Back to top
View user's profile
rztim



Joined: 12 Jul 2010
Posts: 12

PostPosted: Mon Aug 02, 2010 4:25 am    Post subject: Reply with quote

Hey,

sorry, i cant see any memory access violations. Compiling with -Mbounds does not change anything. I build a small driver, but i dont want to post it here. Is it possible to send in per mail?

cheers,

Tim
Back to top
View user's profile
rztim



Joined: 12 Jul 2010
Posts: 12

PostPosted: Mon Aug 02, 2010 8:01 am    Post subject: Reply with quote

I send the test driver to: trs@pgroup.com

Could you look into it? Thank you for the support.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group