PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

How to fill a very large array randomly using CUDA
Goto page Previous  1, 2
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message

Joined: 30 Jun 2004
Posts: 6745
Location: The Portland Group Inc.

PostPosted: Mon Apr 26, 2010 4:23 pm    Post subject: Reply with quote

Hi Rob,

I can't see how to cast this into code that looks like yours - it's the random nature of the way a Monte Carlo code works that's giving me grief. Is the way around this to store an array of values of indx then sum up using your strip mine example?
I wrote an article for the PGInsider (our newsletter) which walks through an a simple Monte Carlo code that might help you (See: You can't call random number from a device kernel and will need to pre-compute these values. Though, the article gives a method for doing this.

The article also shows how to perform a simple sum reduction. Your histogram will follow a similar form. Though, instead of a single element in an array, each thread would need it's own "bin".

One other quick question - doesn't the fact that you are only using 10 threads to do the summations make this algorithm slow?
With any reduction, the code ultimately needs to have a serial portion and it will be slow. Though if done correctly, the serial portion will be very small with little overall performance impact.

Do you need to call syncthreads inside process_kernel to make sure it's completed before process_kernel_sum tries to do its work?
No. The synchronization is implicit for kernels having the same stream.

Would you be willing to have a look at the code itself? - it's about 10 times longer than the example I've used here, but well documented and hopefully easy to read. Let me know - I fully appreciate that it's not really your job to help customers with their code so no worries if you haven't got time - thanks for all the help so far....

First, why don't you see if my article helps any. If your still having problems, then we can take a look at your code for a few minutes and then try an send you in the right direction.

Note you might take a look PGI Accelerator Model as well. As my article shows, it does a great job with the kernel and reduction code. So if your code is dominated by the compute portion of the code rather then copying the data over to GPU (like my little example), then it might be the best way to go. Also, we are working on supporting CUDA device data within Accelerator regions. Once available, it will solve the overhead of copying the random numbers to the GPU.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2
Page 2 of 2

Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Powered by phpBB © phpBB Group