PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Unspecified launch failure

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Greg Poirier



Joined: 05 Oct 2010
Posts: 2

PostPosted: Tue Oct 05, 2010 8:47 am    Post subject: Unspecified launch failure Reply with quote

Hello,

I've recently run into the following error:

Unspecified launch failure

from cudaGetErrorString(cudaGetLastError())

I realize that this is usually to be considered a "segmentation fault," but I can't explain it that way either. Here are snippets of the code in question:

The kernel:
Code:

    attributes(global) subroutine fft_kernel( Nq1, Nq2, Ngrid, Na, Nmode, Nind, Nline, AqqRealDev, AqqImgDev, phaseDev, TermIndexDev, AindDev )

        implicit none
        integer, parameter :: nspace = 3
        integer, value  :: Ngrid, Na, Nmode, Nind, Nq1, Nq2, Nline
        integer :: ii, jj, kk, inz
        real*4                              :: phasefactor
        real*4, dimension(Ngrid, Na, Na) :: phaseDev
        real*4, dimension(Nline,0:2*nspace) :: TermIndexDev
        real*4, dimension(-Nind:Nind) :: AindDev
        real*4, dimension(Nmode,Nmode,Nmode) :: AqqRealDev
        real*4, dimension(Nmode,Nmode,Nmode) :: AqqImgDev
        real*4                          :: tmp

        inz = blockIdx%x * blockDim%x + threadIdx%x
        ii = TermIndexDev(inz, 1)
        jj = TermIndexDev(inz, 2)
        kk = TermIndexDev(inz, 3)

        phasefactor = phaseDev(Nq1, TermIndexDev(inz, 4), TermIndexDev(inz, 6)) + phaseDev(Nq2, TermIndexDev(inz, 5), TermIndexDev(inz, 6))

        tmp = AindDev(TermIndexDev(inz, 0)) * cos(phasefactor)
        AqqRealDev(ii, jj, kk) = AqqRealDev(ii,jj,kk) + tmp

    end subroutine fft_kernel


I've verified that the code executes up until the last assignment to AqqRealDev. I tried changing the last few lines to:

Code:

    tmp = AindDev(TermIndexDev(inz, 0)) * cose(phasefactor)
    tmp2 = AqqRealDev(ii,jj,kk)
    tmp3 = tmp2 + tmp
    AqqRealDev(ii,jj,kk) = tmp3


If I comment out the last line, the code executes without error. If I run it as above, I get the unspecified launch failure again.

Ideas? Have I missed something glaringly obvious?
Back to top
View user's profile
Greg Poirier



Joined: 05 Oct 2010
Posts: 2

PostPosted: Tue Oct 05, 2010 11:06 pm    Post subject: Reply with quote

Further information...

I compiled with device emulation mode and made sure that everything was okay in the debugger. All of my device variables have reasonable memory addresses... they're indexable... If I run the compiled Fortran CUDA code, it executes the kernel, returns from it, and then seg faults when it's copying from device back to host. So, I can only assume that at some point it's still seg faulting, but I can't figure out where. It looks like it's maybe an off-by-one or something random somewhere. I'll post more after I narrow it down.

As always, tips are appreciated.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group