View previous topic :: View next topic 
Author 
Message 
elephant
Joined: 24 Feb 2011 Posts: 22

Posted: Thu Apr 28, 2011 4:24 am Post subject: Parallel Prefix Sum on the GPU (Scan) 


Hello,
does anybody have a FORTRAN code example for an "upsweep" of an array that is optimal for the PGI Accelerator Programming Model?
1, 2, 3, 4, 5, 6, 7, 8, 9, ...
1, 1+2, 3, 3+4, 5, 5+6, 7, 7+8, 9,....
...
Thank you! 

Back to top 


mkcolg
Joined: 30 Jun 2004 Posts: 6734 Location: The Portland Group Inc.

Posted: Thu Apr 28, 2011 4:00 pm Post subject: 


Hi elephant,
I don't, but maybe someone else does.
Do you have an nonaccelerator example source (that's parallel) you could post? Granted, I'm not sure upsweep will work well on a GPU, but having something to start from would be helpful.
 Mat 

Back to top 


elephant
Joined: 24 Feb 2011 Posts: 22

Posted: Fri May 06, 2011 2:40 am Post subject: 


Hi,
The example source for an upsweep would look something like this:
Code: 
do d=0,int(log(dble(N))/log(2.0)1)
do i=0,N1,(2**(d+1))
T(i+2**(d+1)1)=T(i+2**d1)+T(i+2**(d+1)1)
end do
end do

Where T is the 1dim array of size N which entries has to be summed up.
But since I only have to sum up every 8 neighboring entries (sum(1:8), sum(9:16),... ) and not the whole array, I rearanged N to another array Q of size (N/8,8) and wrote a code like the following and got very good speedup!!!
Code: 
!$acc region
do i=1,N/8
Qtemp1(i) = Q(i,1)+Q(i,2)+Q(i,3)+Q(i,4) &
+Q(i,5)+Q(i,6)+Q(i,7)+Q(i,8)
end do
!$acc end region

So, problem solved.
Thank you! 

Back to top 




You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum

Powered by phpBB © phpBB Group
