PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

invalid device function
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mcoffey



Joined: 26 Mar 2011
Posts: 16

PostPosted: Sun Oct 02, 2011 4:43 am    Post subject: invalid device function Reply with quote

Im getting error mesages from my CUDA card that Im finding it all but impossible to solve. After 4 weeks I am a failure. The card is a Quadro FX 1600M that im using to develop with. Im aware its a cc11 and thye second message is related to it being less thatn cc13 but I dont know what the error relates to and how to get round it (if indeed I can). Is it simply that I cant call kernels on this card?

Im simply trying to run the example mmul from the manual.

I compile with
pgfortran.exe -V -ta=nvidia:cuda4.0,nofma,time,wait -Mconcur -Mvect -Mcuda=cc11,ptxinfo -mp -Mchkfpstk -Mipa=fast,inline -Mpfi -Minfo -o mgeno2_MC.exe mgeno2_MC.f90


the error message I get is
42
limit is not supported on this architecture
when i execute this line
Cdev = Csub(1:N,1:L)

and
8
invalid device function

Error! Kernel failed!
when calling the kernel
call mmul_kernel<<<dimGrid>>>( Adev, Bdev, Cdev, N, M, L )

The properties of the card are:
Cuda REset device:
no error

number of cuda devices: 1
got cuda device: 0
Cuda set device:
no error

cuda_prop name:
Quadro FX 1600M



cuda_prop major: 100
cuda_prop minor: 10
cuda_prop MP count: 4
cuda_prop GlobalMem: 268107776
cuda_prop ConstMem: 65536
cuda_prop MemPerBlock: 16384
cuda_prop WarpSize: 32
cuda_prop maxThreadsPerBlock: 512
cuda_prop maxThreadsDim(1): 512
cuda_prop maxThreadsDim(2): 512
cuda_prop maxThreadsDim(3): 64
cuda_prop maxGridSize(1): 65535
cuda_prop maxGridSize(2): 65535
cuda_prop maxGridSize(3): 1
cuda_prop asyncEngineCount: 1
cuda_prop integrated: 0
cuda_prop canMapHostMemory: 1
cuda_prop concurrentKernels: 0
cuda_prop ECCEnabled: 0
cudaLimitStackSize: 0
cudaLimitMallocHeapSize: 0
Cuda Driver Version: 4000
Cuda RunTime Version: 4000

I note the stacksize is zero = not sure why.

Any help would be very gratefully recieved since Im now going mad
Mike
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6146
Location: The Portland Group Inc.

PostPosted: Mon Oct 03, 2011 9:35 am    Post subject: Reply with quote

Hi Mike,

Quote:
Is it simply that I cant call kernels on this card?
While I don't know the specifics of your device, I would think it's CUDA capable. I can run the example on my laptop's Quadro FX 880M.

Are you able to run CUDA C code?

What happens if you simplify your flag set to just:
Code:
pgfortran.exe -V -Mcuda=cc11 -Minfo -o mgeno2_MC.exe mgeno2_MC.f90


Quote:
I note the stacksize is zero = not sure why.
I find this suspicious as well especially given the "limit is not supported on this architecture" error.

- Mat
Back to top
View user's profile
mcoffey



Joined: 26 Mar 2011
Posts: 16

PostPosted: Mon Oct 03, 2011 10:34 am    Post subject: invalid device Reply with quote

Mat, thanks for the reply - I ran the matrix multiply example frorm the CUDA4 SDK Browser which I think is written in C and the output is below (after the devicequery results).
So it looks like the card can run CUDA - are you suggesting its a fortran issue?

I can send the code and datafile if that will help although all it is is an implementation of the matmul example.

Thanks
Mike

Device 0: "Quadro FX 1600M"
CUDA Driver Version / Runtime Version 4.0 / 4.0
CUDA Capability Major/Minor version number: 1.1
Total amount of global memory: 256 MBytes (268107776 bytes)
( 4) Multiprocessors x ( 8) CUDA Cores/MP: 32 CUDA Cores
GPU Clock Speed: 1.25 GHz
Memory Clock rate: 800.00 Mhz
Memory Bus Width: 128-bit
Max Texture Dimension Size (x,y,z) 1D=(8192), 2D=(65536,32768), 3D
=(2048,2048,2048)
Max Layered Texture Size (dim) x layers 1D=(8192) x 512, 2D=(8192,8192)
x 512
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Concurrent copy and execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Concurrent kernel execution: No
Alignment requirement for Surfaces: Yes
Device has ECC support enabled: No
Device is using TCC driver mode: No
Device supports Unified Addressing (UVA): No
Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
<Default>

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 4.0, CUDA Runtime Versi
on = 4.0, NumDevs = 1, Device = Quadro FX 1600M
[deviceQuery.exe] test results...
PASSED


[matrixMul.exe] starting...
[ matrixMul ]
C:\Documents and Settings\All Users\Application Data\NVIDIA Corporation\NVIDIA G
PU Computing SDK 4.0\C\Bin\win32\release\matrixMul.exe Starting (CUDA and CUBLAS
tests)...

Device 0: "Quadro FX 1600M" with Compute 1.1 capability

Using Matrix Sizes: A(160 x 320), B(160 x 320), C(160 x 320)

Runing Kernels...

> CUBLAS Throughput = 22.5992 GFlop/s, Time = 0.00072 s, Size = 16384000
Ops

> CUDA matrixMul Throughput = 18.6016 GFlop/s, Time = 0.00088 s, Size = 16384000
Ops, NumDevsUsed = 1, Workgroup = 256

Comparing GPU results with Host computation...

Comparing CUBLAS & Host results
CUBLAS compares OK

Comparing CUDA matrixMul & Host results
CUDA matrixMul compares OK
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6146
Location: The Portland Group Inc.

PostPosted: Mon Oct 03, 2011 11:50 am    Post subject: Reply with quote

Quote:
So it looks like the card can run CUDA - are you suggesting its a fortran issue?
Possible. If you're just running our basic matmul.cuf example (http://www.pgroup.com/lit/samples/matmul.CUF) then it's most likely a compatibility issue.

Have you tried to compile with the reduced flag set? My concern is that you have both the PGI Accelerator Model flag (-ta) and CUDA Fortran flags (-Mcuda) but there are inconstancies between them (CUDA 4.0, cc11). Since this is a CUDA Fortran code, I'd like to know what happens when you compile without any flags except "-Mcuda=cc11".

If this works, then start adding back flags until you get the failure. If still fails, then we'll need to dig deeper.

Also, are you running Win64 or Win32? Which compiler version?

- Mat
Back to top
View user's profile
mcoffey



Joined: 26 Mar 2011
Posts: 16

PostPosted: Mon Oct 03, 2011 12:20 pm    Post subject: invalid device function Reply with quote

I forgot - windows xp 32bit and pgi 11.8

hmmm, it looks like we may be getting soimewhere. Im sorry I didnt realise that you couldnt use -ta=nvidia and -Mcuda at the same time.

I removed all flags except
pgfortran.exe -V -Mcuda=cc11 -Minfo -o mgeno2_MC.exe mgeno2_MC.f90

and compiled bit it gave me a few unresolved externals (cudaDeviceReSet()) which I find odd because there are other cuda calls e.g. cuda_info = cudaGetDeviceCount(cuda_numdevices)
which are OK - does that mean they are in different libraries?

I commented the offending ones out and it compiles and runs - the error 42 has gone but still the invalid device function remains

sizeAsub=: 23936
sizeBsub=: 23936
sizeCsub=: 64
sizeCsub1=: 8
sizeCsub2=: 8
mmulmc: Allocating Adev, Bdev and Cdev
mmulmc: dimGrid= 2 2 1
mmulmc: dimBlock= 4 4 1
mmulmc: assigning grid
mmulmc: starting main loop
mmulmc: zeroising Csub
mmulmc: Copy zeroised Csub to Cdev
mmulmc: calling mmul_kernel
8
invalid device function

arning: ieee_inexact is signaling
rror! Kernel failed!
thread 0 stack: max 2241KB, used 0KB
thread 1 stack: max 0KB, used 0KB
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group