|
| View previous topic :: View next topic |
| Author |
Message |
goblinsqueen
Joined: 04 Feb 2010 Posts: 14
|
Posted: Thu Apr 01, 2010 3:03 am Post subject: Performance CUDA fortran |
|
|
Please, consider the following simple code, which represents a "stupid" scheme of the code I'm porting in CUDA Fortran.
| Code: |
module mod_test
use cudafor
implicit none
contains
attributes(device) real*8 function calc2(i,j,k,l)
implicit none
integer, value :: i,j,k,l
calc2 = (log(real(i))+log(real(j)))/exp(real(k))/exp(real(l))
end function
attributes(device) subroutine calc(counter,idx,Vettore)
implicit none
integer, value :: counter, idx
real*8, dimension(counter),device :: Vettore
real*8, dimension(32), shared :: vettore_shared
real*8 :: var
integer :: i,j,k,l, tid
tid = threadidx%x
do i=1,20
do j=1,20
do k=1,12
do l=1,200
!var = calc2(i,j,k,l) ! fast solution
vettore_shared(tid) = calc2(i,j,k,l) ! slow
!Vettore(idx) = calc2(i,j,k,l) ! slow
end do
enddo
enddo
enddo
end subroutine
attributes(global) subroutine kernel_test(counter,Vettore)
implicit none
integer, value :: counter
real*8, dimension(counter),device :: Vettore
integer :: idx
idx = (blockidx%x-1)*blockdim%x + threadidx%x
call calc(counter,idx,Vettore)
end subroutine
end module
program test
use cudafor
use mod_test
implicit none
integer :: nblocks, nthreads, counter
integer :: c1, c2
real*8, dimension(:), allocatable :: Vettore_host
real*8, dimension(:), allocatable, device :: Vettore_dev
counter = 9216
nthreads = 32
nblocks = counter/nthreads
call system_clock(count=c1)
allocate(Vettore_host(counter),Vettore_dev(counter))
Vettore_dev = 0.d0
Vettore_host = 0.d0
call kernel_test<<<nblocks,nthreads>>>(counter,Vettore_dev)
Vettore_host = Vettore_dev
deallocate(Vettore_host,Vettore_dev)
call system_clock(count=c2)
write(*,*) 'time ', c2-c1
end program
|
I have a routine which computes an element of the array Vettore_dev. If I update directly the device array Vettore_dev on the device subroutine calc, the code takes ~ 4 seconds to execute. The surprising fact for me is that if I use a shared array, i.e. vettore_shared, I still have ~ 4 sec of time to execute.
Why? I expected a much faster program using a shared memory array.
What's wrong?
Thank you in advance for every comment! |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Apr 01, 2010 4:43 pm Post subject: |
|
|
Hi goblinqueen,
The compiler is smart enough to notice that the result from 'var' is never used, hence is optimizing away the work. This is why this version is so much faster. Instead, at the bottom of the last do loop, add "Vettore(idx) = var". Note that the "var" version will still be faster then the other two since "var" can be held in a register.
Now as to the differences between shared and global, I'm still in the process of learning this myself so unfortunately don't have any great insights. My experimentation shows little speed-up when using shared memory unless your able to reuse the memory many times. Why? I'm not sure yet.
Any other users have any insights?
- Mat |
|
| Back to top |
|
 |
sinsin
Joined: 11 Dec 2008 Posts: 11
|
Posted: Thu Apr 01, 2010 8:55 pm Post subject: |
|
|
Dear Mat,
How do you know, "var" can be held in a register?
Many Thanks!
Sin sin |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Fri Apr 02, 2010 8:44 am Post subject: |
|
|
Hi Sin sin,
I don't know for sure, just that it can. Though, given that it's a scalar with a very high degree of re-use and that I see a speed-up, it most likely is being placed in a register. Can I prove it? No. Most likely? Yes.
- Mat |
|
| Back to top |
|
 |
sinsin
Joined: 11 Dec 2008 Posts: 11
|
Posted: Fri Apr 02, 2010 5:54 pm Post subject: |
|
|
Hi Mat,
Thanks a lots! I just wonder is it possible to control the variables held in a register.
Sin sin |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|