| View previous topic :: View next topic |
| Author |
Message |
Bullish
Joined: 21 Mar 2010 Posts: 5
|
Posted: Mon May 10, 2010 2:37 am Post subject: how to carry out the sum operation in cuda fortran? |
|
|
For a large size array,it's fairly easy to realize the sum operation in cudaC via pointer, and I just wonder how to perform this operation efficiently in cuda fortran using GPU.
thanks! |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
|
| Back to top |
|
 |
Bullish
Joined: 21 Mar 2010 Posts: 5
|
Posted: Tue May 11, 2010 12:31 am Post subject: |
|
|
Hi Mat,
Firstly thank you for your reply. The sum operation I mentioned is exactly the intrinsic function sum() in Fortran. I tried to rewrite function sum() with CUDA Fortran, and the GPU code is much slower than CPU.According to my knowledge, CUDA fortran doesn't support direct memory address operation, so the GPU capability is hard to be fulled exploited even with the partial sum trick. Have you encountered such problem? |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Tue May 11, 2010 9:22 am Post subject: |
|
|
Hi Bullish,
Using the sum intrinsic from within a device kernel would be very slow since each thread would be performing the sum and need to access the device's global memory. I would advice against using the reduction intrinsics in a device kernel unless you are reducing a small local or shared array.
To efficiently perform reductions, you should follow the partial reduction examples described earlier. Note that sum reductions on a GPU are not expected to be faster then the CPU. Rather, they should only be used if the cost to transfer the data is greater than the cost of the reduction.
Note that as of the 10.5 release, the PGI accelerator model is able to use CUDA Fortran device data. This will allow you to utilize the PGI accelerator's highly optimized reductions within CUDA Fortran. For example from the host add the follow and tehn compile with "-ta=nvidia".
| Code: | !$acc region
sumVal = sum(devArr)
!$acc end region |
As for your question about direct memory address (DMA) operations, again I'm not clear as to what you mean. DMA has to do with how data is transferred to and from the CPU and GPU. Do you mean pinned memory (which is supported in CUDA Fortran)?
- Mat |
|
| Back to top |
|
 |
|