Author 
Message 
elephant
elephant

Joined: 24 Feb 2011

Posted: Thu Apr 28, 2011 4:24 am Post subject: Parallel Prefix Sum on the GPU (Scan) 


Hello,
does anybody have a FORTRAN code example for an "upsweep" of an array that is optimal for the PGI Accelerator Programming Model?
1, 2, 3, 4, 5, 6, 7, 8, 9, ...
1, 1+2, 3, 3+4, 5, 5+6, 7, 7+8, 9,....
...
Thank you! 

mkcolg
mkcolg

Joined: 30 Jun 2004

Posted: Thu Apr 28, 2011 4:00 pm 


Hi elephant,
I don't, but maybe someone else does.
Do you have an nonaccelerator example source (that's parallel) you could post? Granted, I'm not sure upsweep will work well on a GPU, but having something to start from would be helpful.
 Mat 

elephant
elephant

Joined: 24 Feb 2011

Posted: Fri May 06, 2011 2:40 am 


Hi,
The example source for an upsweep would look something like this:
Code: 
do d=0,int(log(dble(N))/log(2.0)1)
do i=0,N1,(2**(d+1))
T(i+2**(d+1)1)=T(i+2**d1)+T(i+2**(d+1)1)
end do
end do

Where T is the 1dim array of size N which entries has to be summed up.
But since I only have to sum up every 8 neighboring entries (sum(1:8), sum(9:16),... ) and not the whole array, I rearanged N to another array Q of size (N/8,8) and wrote a code like the following and got very good speedup!!!
Code: 
!$acc region
do i=1,N/8
Qtemp1(i) = Q(i,1)+Q(i,2)+Q(i,3)+Q(i,4) &
+Q(i,5)+Q(i,6)+Q(i,7)+Q(i,8)
end do
!$acc end region

So, problem solved.
Thank you! 

