PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

Using Cublas in Device Kernels

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
Mr.Savage



Joined: 14 Feb 2013
Posts: 6

PostPosted: Tue Aug 13, 2013 8:06 pm    Post subject: Using Cublas in Device Kernels Reply with quote

I want to use cublasDgetrfBatched and cublasDgetriBatched to batch invert small matrices (6x6). These functions are called after a syncthreads command half way through a device kernel. The idea is batch inversion of a thread block of these small matrices.

When I go to compile I get the message: calls from device code to a host function are allowed only in emulation mode and that the function has not been declared. I think this simply means that the library functions aren't being found from the module USE cublas_device. I can't seem to find good up to date documentation on what predefined modules contain.

Do I need to define my own interface for these library functions on a k20 card, cuda5.0, and compiler V13.4?

I modeled my my cublas call after the cublas example in
http://www.pgroup.com/lit/articles/insider/v5n1a2.htm

where a cublas call looked like the following:

Code:
CONTAINS
  attributes(global) subroutine dgemm16(a, b, c, m, n, k)
    use cublas_device
    integer, value :: m, n, k
    double precision, device :: a(m,*), b(k,*), c(m,*)
    double precision, device :: alpha, beta
    type(cublasHandle) :: ch1
    integer transa, transb
    i = threadIdx%x
    if (i.eq.1) then
        istat = cublasCreate_v2(ch1)
        alpha = 1.0d0
        beta  = 0.0d0
        transa = cublas_op_n
        transb = cublas_op_n
        istat = cublasDgemm_v2(ch1, transa, transb, m, n, k, alpha, &
                                   a, m, b, k, beta, c, m)
        istat = cublasDestroy_v2(ch1)
    end if
    return
    end subroutine

This snippet is compiled with
Code:
  pgf90 -Mcuda=cuda5.0,cc35,rdc -fast dgemmdevcublas.cuf -o dgemmdevcublas.exe -lcublas_device


My make file is:

Code:
FLAGS = -V13.4 -fast -Mconcur=innermost
FLAGS_CUDA =-Mcuda=cuda5.0,cc35,rdc -tp:x64 -lcublas_device
F90=pgf90

# Variables
SOURCES = Variables.f90 CUDA_Kernels.f90 cpty.f90
OBJECTS = $(SOURCES: .f90=.o)
EXECUTABLE = CUDA_Parent

all: $(SOURCES) $(EXECUTABLE)

$(EXECUTABLE): $(OBJECTS)
   $(F90) $(FLAGS) $(FLAGS_CUDA) $(OBJECTS) -o $@
.f90:
   $(F90) $(FLAGS) $(FLAGS_CUDA) $< -o $@

# Cleans
.PHONY: clean
clean:
   rm *.mod *.o CUDA_Parent
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group