PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

reduction operation
Goto page Previous  1, 2
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
xray



Joined: 21 Jan 2010
Posts: 85

PostPosted: Tue Sep 07, 2010 1:56 am    Post subject: How does the reduction with PGI Acc work? Reply with quote

Hi,
it's really good that now the PGI Accelerator Model recognizes reductions and manages them. But I would like to know how the reduction works internally with the Accelerator model.
Thanks in advance.
Sandra
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6146
Location: The Portland Group Inc.

PostPosted: Thu Sep 09, 2010 11:28 am    Post subject: Reply with quote

Hi Sandra,

Once the compiler recognizes a reduction, the compiler will generate an intermediate array to hold the reduction values for each thread. After the main kernel completes, a second highly-optimized kernel is launched to perform the actual reduction. If you're interested, NVIDIA has posted a slide-deck detailing how to create optimized reductions HERE.

Hope this helps,
Mat
Back to top
View user's profile
xray



Joined: 21 Jan 2010
Posts: 85

PostPosted: Thu Sep 09, 2010 11:53 pm    Post subject: Reply with quote

Hi Mat,
thanks for your response. So, did I understand it right, that you get synchronization by using a secend kernel? And is then the intermediate array located on CPU memory? i.e. if I use our cluster batch system, do I have to reserve memory for this intermediate array as well?
Last question: So, for your second kernel do you use the last optimized algorithm from the nvidia-reduction-slides?
Cheers, Sandra
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6146
Location: The Portland Group Inc.

PostPosted: Fri Sep 10, 2010 8:52 am    Post subject: Reply with quote

Quote:
So, did I understand it right, that you get synchronization by using a second kernel?
Yes.

Quote:
then the intermediate array located on CPU memory?
No, it's on the GPU.
Quote:

So, for your second kernel do you use the last optimized algorithm from the nvidia-reduction-slides?
I do believe that this is the standard algorithm for performing reductions.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group