|
| View previous topic :: View next topic |
| Author |
Message |
wiersma
Joined: 16 May 2013 Posts: 3
|
Posted: Thu May 16, 2013 2:15 pm Post subject: Linking CUDA fortran compiled code with ifort |
|
|
Hi all,
Please bear with me if this is obvious - I wouldn't describe myself as an expert in compilers or with Fortran by any stretch of the imagination.
As a part of evaluating whether CUDA would suit my needs, I tried just dropping one of the matrix multiplication routines (say, here http://geco.mines.edu/software/pg10/gpu/pgicudaforug.pdf) into my code suite and see what happens. I'm having difficulty compiling though.
Normally my code is arranged in modules by file, so would have a file named mod1.f90:
| Code: |
module mod1
!lblah blah blah
end module mod1
|
And then test.f90 containing:
| Code: |
program test
use mod1
end program test
|
Which I compile using
| Code: |
ifort -c mod1.f90
ifort -c test.f90
ifort -o test mod1.o test.o
|
Simple right? Now if I have a cuda fortran module named, say cmod1.cuf:
| Code: |
module cmod1
use cudafor
!lblah blah blah
end module cmod1
|
I try compiling:
| Code: |
pgf90 -c cmod1.cuf
ifort -c test.f90
ifort -o test cmod1.o test.o
|
But get
| Quote: | test.f90(4): error #7013: This module file was not generated by any release of this compiler. [cmod1]
use cmod1
-------^ |
When trying to compile test.f90. So ifort doesn't like pgf90 compiled modules? Do I have to compile everything with pgf90, because our suite uses some ifort specific things.
(Sorry if this was incredibly verbose, I just wanted to be clear :)). |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu May 16, 2013 5:19 pm Post subject: |
|
|
Hi wiersma,
| Quote: | | When trying to compile test.f90. So ifort doesn't like pgf90 compiled modules? Do I have to compile everything with pgf90, because our suite uses some ifort specific things. | Several F90 features such as modules and allocatable arrays are not compatible between Fortran compilers.
What you'll need to do is put the CUDA Fortran routines into a library and then have the main Fortran program call the routines via a F77 or C wrapper.
Hope this helps,
Mat |
|
| Back to top |
|
 |
wiersma
Joined: 16 May 2013 Posts: 3
|
Posted: Tue May 21, 2013 1:45 pm Post subject: |
|
|
Hi all,
Thanks for the reply, but now I'm having difficulty setting up a library.
So now I have a pretty basic example:
cmod1.cuf:
| Code: |
module cmod1
use cudafor
contains
subroutine mmul( A, B, C )
real, dimension(:,:) :: A, B, C
end subroutine mmul
end module cmod1
|
test.f90:
| Code: |
program test
implicit none
real :: A(10,10), B(10,10), C(10,10)
A = 0.
B = 0.
C = 0.
call mmul(A,B,C)
end program test
|
The commands:
| Code: |
pgf90 -c cmod1.cuf
ar rcvs libcmod1.a cmod1.o
ifort -c test.f90
ifort test.o -L. -lcmod1 -o test
|
give:
| Code: |
test.o: In function `MAIN__':
test.f90:(.text+0x750): undefined reference to `mmul_'
|
Sorry if this is basic library stuff - I've tried multiple configurations, but can't seem to figure this out. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed May 22, 2013 11:49 am Post subject: |
|
|
Hi wiersma,
So I had never actually tried calling CUDA Fortran code from Intel compiled code. Turned out to be a bit more tricky then I thought. We do some magic during the initialization of the program that's necessary to get the device working properly. Hence, you have to link with the PGI driver to use CUDA Fortran. Here's the steps I did:
| Code: | % cat prog.f90
! just call the main test program
program testp
call test()
end program testp
% cat test.f90
! change the Intel main program to a subroutine
subroutine test
implicit none
real :: A(10,10), B(10,10), C(10,10)
A = 10.1
B = 0.11
C = 0.
call cmod1_mmul(A,B,C,10,10)
print *, A(1,2), B(2,4), C(3,6)
end subroutine test
% cat cmod1.cuf
! Here's a basic CUDA Fortran kernel
module cmod1
use cudafor
contains
attributes(global) subroutine kernel( A, B, C, N, M )
real, device :: A(N,M), B(N,M), C(N,M)
integer, value :: N, M
integer :: i, j, tx, ty
! Get the thread indices
tx = threadidx%x
ty = threadidx%y
i = (blockidx%x-1) * blockdim%x + tx
j = (blockidx%y-1) * blockdim%y + ty
if (i .le. N .and. j .le. M) then
C(i,j) = A(i,j)+ B(i,j)
endif
end subroutine kernel
! need to pass in arrays a automatic or assumed-shaped
subroutine mmul( A, B, C, N, M)
implicit none
real, dimension(n,m) :: A, B, C
real, device, allocatable, dimension(:,:) :: Ad, Bd, Cd
integer i,N,M,ierr
type(dim3) :: dimGrid, dimBlock
allocate(Ad(N,M), Bd(N,M), Cd(N,M))
Ad=A
Bd=B
dimGrid = dim3( N/16, M/16, 1 )
dimBlock = dim3( 16, 16, 1 )
call kernel<<<dimGrid,dimBlock>>>( Ad, Bd, Cd, N, M )
C=Cd
deallocate(Ad, Bd, Cd)
end subroutine mmul
end module cmod1
% pgf90 -c cmod1.cuf
% ifort -c test.f90
% pgf90 -c prog.f90
% pgf90 -Mcuda prog.o test.o cmod1.o -L/opt/intel/composerxe-2013.3.163/compiler/lib/intel64/ -lifport -lifcore -limf
% a.out
N= 10 M= 10 ierr= 0
10.10000 0.1100000 10.21000 |
You can also wrap-up "cmod1.o" into a library.
- Mat |
|
| Back to top |
|
 |
wiersma
Joined: 16 May 2013 Posts: 3
|
Posted: Mon May 27, 2013 2:09 pm Post subject: |
|
|
| Works great! Thanks a bunch and I'm glad to know I wasn't missing something completely obvious :). |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|