PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

cuCtxSynchronize error 700

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
bollig



Joined: 14 Jul 2009
Posts: 5

PostPosted: Wed Jul 15, 2009 3:47 pm    Post subject: cuCtxSynchronize error 700 Reply with quote

I am trying to accelerate a single subroutine. If I put a compute region around the full content of the routine I get an error:

"call to cuCtxSynchronize returned error 700: Launch failed"

However, if I split the routine up into two compute regions everything executes correctly (extra unwanted overhead, but it works). Its not clear to me why this happens.

Looking at the code its possible that there are bank conflicts (it is a 2D FD stencil), but the kernel should still launch. More detail on the error or similar experience would be appreciated.

A quick search brought up this forum post: http://forums.nvidia.com/lofiversion/index.php?t93295.html

Could it be that the PG compiler is making the same cuParamSeti mistake? I AM on an x86_64 system with 4 Tesla 1060s...
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Thu Jul 16, 2009 8:55 am    Post subject: Reply with quote

Hi bollig,

This is generic error usually meaning that there was a memory error (such as a segv) when transferring memory over to the GPU. Most likely the two loops share some common arrays where the compiler is getting confused about the bounds. I found a similar issue in one the codes I was working on and reported it to our engineers. I was able to work around the problem by using the "copy", "copyin" and "copyout" clauses to explicitly set the array bounds. Use the "-Minfo=accel" messages produced during compilation to see what bounds the compiler is using.

If you can, please send a report to PGI customer service (trs@pgroup.com) and include the source (plus any data files and build instructions if needed).

Thanks,
Mat
Back to top
View user's profile
TheMatt



Joined: 06 Jul 2009
Posts: 306
Location: Greenbelt, MD

PostPosted: Wed Aug 26, 2009 11:36 am    Post subject: Reply with quote

Mat,

I have a new case of this, so I'm bumping this thread (the title is about what I'd use). If I don't use local, copyin, and copyout, my code errors out with a cuMemcpy2D. But, adding them leads to a cuCtxSynchronize error. In case it's something simple (like a dimension I'm missing), I'm reproducing my variables and !$acc region lines:
Code:
c-----input parameters

      integer m,np,ict,icb,ih1,ih2,im1,im2,is1,is2
      real rr(m,0:np+1,2),tt(m,0:np+1,2),td(m,0:np+1,2)
      real rs(m,0:np+1,2),ts(m,0:np+1,2)
      real cc(m,3)

c-----temporary array

      integer i,k,ih,im,is
      real rra(m,0:np+1,2,2),tta(m,0:np+1,2,2),tda(m,0:np+1,2,2)
      real rsa(m,0:np+1,2,2),rxa(m,0:np+1,2,2)
      real ch(m),cm(m),ct(m),flxdn(m,0:np+1)
      real fdndir(m),fdndif(m),fupdif
      real denm,xx,yy

c-----output parameters

      real fclr(m,np+1),fall(m,np+1)
      real fsdir(m),fsdif(m)

!$acc region
!$acc& copyin(rr(1:m,0:np+1,1:2),
!$acc& tt(1:m,0:np+1,1:2),
!$acc& td(1:m,0:np+1,1:2),
!$acc& rs(1:m,0:np+1,1:2),
!$acc& ts(1:m,0:np+1,1:2),
!$acc& cc(1:m,1:3))
!$acc& copyout(fclr(1:m,1:np+1),
!$acc& fall(1:m,1:np+1),
!$acc& fsdir(1:m),
!$acc& fsdif(1:m))
!$acc& local(rra(1:m,0:np+1,1:2,1:2),
!$acc& tta(1:m,0:np+1,1:2,1:2),
!$acc& tda(1:m,0:np+1,1:2,1:2),
!$acc& rsa(1:m,0:np+1,1:2,1:2),
!$acc& rxa(1:m,0:np+1,1:2,1:2),
!$acc& ch(1:m),
!$acc& cm(1:m),
!$acc& ct(1:m),
!$acc& flxdn(1:m,0:np+1),
!$acc& fdndir(1:m),
!$acc& fdndif(1:m))

As near as I can tell, I have the array dimensions correct. Upon compiling, the -Minfo=accel outputs:
Code:
   2316, Generating copyin(td(:m,:np+1,:))
         Generating copyin(tt(:m,:np+1,:))
         Generating copyin(rs(:m,:np+1,:))
         Generating copyin(rr(:m,:np+1,:))
         Generating copyin(ts(:m,:np+1,:))
         Generating local(ch(:m))
         Generating copyin(cc(:m,:))
         Generating local(cm(:m))
         Generating copyout(fsdir(:m))
         Generating copyout(fsdif(:m))
         Generating local(flxdn(:m,:np+1))
         Generating local(fdndif(:m))
         Generating local(fdndir(:m))
         Generating local(tta(:m,:np+1,:,:))
         Generating local(tda(:m,:np+1,:,:))
         Generating local(rsa(:m,:np+1,:,:))
         Generating local(ct(:m))
         Generating local(rxa(:m,:np+1,:,:))
         Generating local(rra(:m,:np+1,:,:))
         Generating copyout(fclr(:m,:np+1))
         Generating copyout(fall(:m,:np+1))

The compiler seems to have suppressed some dimensions to ":", could that do it? Or should I use ct(m) rather than ct(1:m), say?

ETA: I'm suspecting bug after some trial and error. I'll submit to Customer Service.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group