PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

unspecified launch failure
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
cu239



Joined: 20 Mar 2009
Posts: 14

PostPosted: Tue Feb 01, 2011 10:41 pm    Post subject: unspecified launch failure Reply with quote

Hi,

I got an error message during a CUDA Fortran program run:
"
0: copyin Symbol Memcpy (dev=0x604e60, host=0x60ce30, size=4, offset=12132) FAILED: 4<unspecified launch failure>
"

Does this mean that I can't copy too many variables from host to device?

System information:
OS: Windows XP SP3
CPU: Intel core i7-920
GPU: Geforce GTX 460
Compiler: PGI Accelerated Visual Fortran 11.1
CUDA Toolkit version: 3.2

The failed subroutine:
=================================
subroutine init_dev
use All_vars
use dev_vars
implicit none

! Main Parameters:
dte_dev = dte
Ramp_dev = Ramp
HORCON_dev = HORCON
HPRNU_dev = HPRNU
VPRNU_dev = VPRNU
UMOL_dev = UMOL
BFRIC_dev = BFRIC
Z0B_dev = Z0B
z0=z0b
CBCMIN = BFRIC

temp=(ZZ(KBM1)-Z(KB))/z0
zbz0_dev=temp
Print*,"Debug ---- 1"
z0_dev=z0b_dev
CBCMIN_dev=BFRIC_dev
Print*,"Debug ---- 2"

! Basic Extents:
nt_dev = nt
ns_dev = ns
nn_dev = nn
NTT_dev = NTT
Print*,"Debug ---- 3"
KB_dev = KB
KBM1_dev = KBM1
KBM2_dev = KBM2
numebc_dev = numebc
numqbc_dev = numqbc
numfbc_dev = numfbc
Print*,"Debug ---- 4"

... ...
=================================
All "*_dev" variables are 4 bytes "real,constant" or "integer,constant".

The screen displays:
"
Debug ---- 1
Debug ---- 2
0: copyin Symbol Memcpy (dev=0x604e60, host=0x60ce30, size=4, offset=12132) FAILED: 4<unspecified launch failure>
"

Does anybody know this problem? Thank you!


Bingray
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5871
Location: The Portland Group Inc.

PostPosted: Wed Feb 02, 2011 9:32 am    Post subject: Reply with quote

Hi Bingray,

It's your device to device copies that's causing the problem. We are working on adding support for device to device copies, but it's not available yet.

Changing:
Code:

z0_dev=z0b_dev
CBCMIN_dev=BFRIC_dev

to
Code:

z0_dev=Z0B 
CBCMIN_dev=BFRIC

Hope this helps,
Mat
Back to top
View user's profile
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Fri Feb 04, 2011 3:53 am    Post subject: Reply with quote

mkcolg wrote:
Hi Bingray,

It's your device to device copies that's causing the problem. We are working on adding support for device to device copies, but it's not available yet.

Hope this helps,
Mat


Hi Mat, is the document ahead of the implementation? Cause from the new CUDA Fortran manual, I think this feature should be supported.

Tuan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5871
Location: The Portland Group Inc.

PostPosted: Fri Feb 04, 2011 12:14 pm    Post subject: Reply with quote

Hi Tuan,

Actually the CUDA Fortran reference guide was written before implementation. There were a few things that turned out very difficult to implement (such are random_number) and have not been added to a release yet.

Though, device to device transfers located in global device memory are allowed, however the user in this case is trying to copy device to device located in constant memory. I should have been more clear in my response.

I'm not even sure if CUDA C allows for constant to constant data transfers. If they do, then this is more of a bug on our part since we obviously missed this. If they don't, then we'll most likely not be able to support it either.

Thanks for questioning me when my answers are not as clear as they should be.

- Mat
Back to top
View user's profile
cu239



Joined: 20 Mar 2009
Posts: 14

PostPosted: Sat Feb 05, 2011 8:09 am    Post subject: Reply with quote

Hi Mat,


Thanks a lot. It's much better now after I changed the code. However, there is still a similar problem even after I cancelled all device-to-device copy. I have a very long loop in a "A3DMain.f" like this:
Code:
   Do nstep=1,50000
       THOUR = FLOAT(NSTEP) * DTE / 3600.
       RAMP = TANH(FLOAT(NSTEP)/FLOAT(IRAMP+1))
       call B1_dev
       call B2_dev
       call B3_dev
 ...
       call B10_dev
       call B11_dev
       call B12_dev
   Enddo

Each of the B*_dev subroutines is written in a .cuf file and calls several global kernels.
When nstep is 1, the code is running OK through B1_dev to B12_dev. But when nstep is 2, at the first transfer code of B1_dev:
Code:
   Ramp_dev = Ramp

, a similar error occurs:
"0: copyin Symbol Memcpy (dev=0x60eac0, host=0x616878, size=4, offset=12072) FAILED: 4<unspecified launch failure>"

I'll check the entire code once again. I want to know what else would lead to this problem. Thanks!

Bingray
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group