PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

On Matmul Kernels for Cuda Fortran

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
kevinlee



Joined: 11 May 2008
Posts: 5

PostPosted: Sun Apr 18, 2010 2:54 pm    Post subject: On Matmul Kernels for Cuda Fortran Reply with quote

This may be a question directed towards Michael Wolfe's article in HPCWire, on "Compilers and More: Optimizing GPU Kernels" in October 2008 (available online at http://www.hpcwire.com/features/Compilers_and_More_Optimizing_GPU_Kernels.html?viewAll=y)

All the examples for matmul were done for CUDA C; would the same barrage of tests for CUDA Fortran yield the same results?

That is, are there any nuances because of Fortran programming (i.e. column-major ordering vs. row-major ordering), etc.?

I know the above is a very generic question, but my goal would be to transfer some of these CUDA C kernels over to CUDA Fortran and benchmark.

Also, I wanted to query if there are more examples of optimizing matmul kernels in Fortran beyond the one presented in the PGroup Cuda Fortran programming guide and Insider. Spam me with links or replies, thanks!
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Thu Apr 22, 2010 10:14 am    Post subject: Reply with quote

Quote:
All the examples for matmul were done for CUDA C; would the same barrage of tests for CUDA Fortran yield the same results?
Not having gone through the process, I can't be sure, but I believe it should.

Quote:
That is, are there any nuances because of Fortran programming (i.e. column-major ordering vs. row-major ordering), etc.?
Since Fortran is column-major, you do need to keep this in mind when optimizing for memory usage. See the section titled "Improving Warp Performance" on my PGInsider Monte Calro article http://www.pgroup.com/lit/articles/insider/v2n1a4.htm for an example.

Quote:
Also, I wanted to query if there are more examples of optimizing matmul kernels in Fortran beyond the one presented in the PGroup Cuda Fortran programming guide and Insider. Spam me with links or replies, thanks!
While I'm working on more examples, I hadn't planned on adding more matmul variations. Any students out there looking for a paper topic?

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group