|
| View previous topic :: View next topic |
| Author |
Message |
DAVID-SPH
Joined: 23 May 2011 Posts: 28
|
Posted: Wed Aug 15, 2012 12:29 pm Post subject: |
|
|
Well this si the function giving all the trouble... I guess is still the same issue. I'm programming Sean Baxter's scan and radix sort subroutines in CUDA Fortran, I guess is still the problem with shared memory declaration?
| Code: |
type integer2
integer :: x
integer :: y
end type
attributes(device) type(integer2) function Multiscan(tid, x, reduction_shared, totals_shared)
integer :: tid
integer :: x
integer :: warp, lane, i, sum, offset
integer :: total, totalsSum
type(integer2) :: result
integer,volatile, dimension(:) :: reduction_shared!(SCANSIZE)!(*)!(:)!(3256)!(ScanSize)
integer,volatile, dimension(:) :: totals_shared!((*)! (48)!(NUM_WARPS + NUM_WARPS/2)
integer, volatile :: s, s2 !we have a problem here in the translation of
warp = tid / WARP_SIZE ! check this one for fortran charac.
lane = IAND((WARP_SIZE - 1), tid) + 1 !in fortran so we are starting in 1; in c: (WARP_SIZE - 1) & tid
s = SCANSTRIDE * warp + lane + WARP_SIZE / 2 !index/pointer
reduction_shared(s - 16) = 0 !The first 32 position will be filled with zeros
reduction_shared(s) = x !And now only the first 16 will...
!! Run inclusive scan on each warp's data.
sum = x
!CUDA Fortran compiler is suppoused to unroll the loop for us...
do i = 1, LOG_WARP_SIZE
offset = ISHFT(1, i-1)!1 << (i - 1)
sum = sum + reduction_shared(s-offset)
reduction_shared(s) = 0
end do
!! Synchronize to make all totals available to the reduction code
call syncthreads()
if(tid < NUM_WARPS)then
!! Grab the block total for the tid'th block. This is the last element
!! in the block's scanned sequence. This operation avoids bank
!! conflicts.
total = reduction_shared(ScanStride* tid + WARP_SIZE/2 + WARP_SIZE ) !- 1) !this -1 may be eliminated
totals_shared(tid) = 0
s2 = NUM_WARPS / 2 + tid
totalsSum = total
totals_shared(s2) = total
!! Compiler shoud unroll this one
do i = 1, LOG_NUM_WARPS
offset = ISHFT(1, i-1)!1 << (i - 1)
totalsSum = totalsSum + totals_shared(s2-offset)
totals_shared(s2) = totalsSum
end do
!! Subtract total from totalsSum for an exclusive scan.
totals_shared(tid) = totalsSum - total
end if
!! Synchronize to make the block scan available to all warps
call syncthreads()
sum = sum + totals_shared(warp)
total = totals_shared(NUM_WARPS + NUM_WARPS / 2) !)- 1) !el - 1
result%x = sum
result%y = total
!!!!!!!!!!!!!!!!!!!!! and return...
Multiscan = result
end function Multiscan |
|
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Aug 15, 2012 3:13 pm Post subject: |
|
|
Hi David,
I can't really tell much from this. Can you send a reproducing example to PGI Customer Service (trs@pgroup.com) and ask them to send it to me?
Also, which error are you getting with this code? The function 0 ICE or the Shared dummy as an argument?
Thanks,
Mat |
|
| Back to top |
|
 |
DAVID-SPH
Joined: 23 May 2011 Posts: 28
|
Posted: Thu Aug 16, 2012 9:30 pm Post subject: |
|
|
it is the 0 ICE problem.
Th shared memory dummy was solved with your tip.
I'll try to send the full code laetr today.
Thanks |
|
| Back to top |
|
 |
DAVID-SPH
Joined: 23 May 2011 Posts: 28
|
Posted: Sun Aug 19, 2012 8:46 am Post subject: |
|
|
Ok the problem seem to be using the ISHFT bit intrinsic, any reason for that?
According to the CUDA Fortran reference is a perfectly valid call...
integer ishft(integer, integer)...
It is relatevely easy to sustitue as I use to calculate multiples of 2 ... but bit intrinsics are fast...I would like to use them.. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Aug 20, 2012 10:11 am Post subject: |
|
|
Yep, it's the ISHFT. CUDA Fortran does support ISHFT, but currently only if the "shift" argument is a constant. In this case, ISHFT is inlined but when it's a variable, a call is emitted.
I asked engineering and they do have these on their TODO list but it was pushed to a lower priority (you're the first to ask for these). I added a report (TPR#18883) to help track this and the other missing elemental functions.
Thanks,
Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|