PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

CUDA Fortran + float3/float4

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
xray



Joined: 21 Jan 2010
Posts: 85

PostPosted: Wed Apr 06, 2011 5:10 am    Post subject: CUDA Fortran + float3/float4 Reply with quote

Hello,
using NVIDIA's CUDA C, there are built-in vector data types as float3 and float4 (which promise good memory access pattern and alignment, as far as I know).

Does CUDA Fortran have analogous derived types?
If I do it manually (see below), then I don't know how to ensure correct alignment...
Code:
type :: float4
  sequence
  real*4:: x,y,z,w
end type float4


Cheers, Sandra
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6128
Location: The Portland Group Inc.

PostPosted: Thu Apr 07, 2011 10:11 am    Post subject: Reply with quote

Hi Sandra,

No, CUDA Fortran does support these vector types. Though since Fortran allows you to perform operation on whole arrays, I'm wondering if they are necessary. Wouldn't declaring a 3 or 4 element array work?

- Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6128
Location: The Portland Group Inc.

PostPosted: Mon Apr 11, 2011 1:35 pm    Post subject: Reply with quote

Hi Sandra,

Here's the response from Michael:

Quote:
The vector data types (float3, float4) are important when programming
with OpenCL for the ATI, but aren't needed for good performance on
NVIDIA. They are used in CUDA mostly for texture and surface references.

We don't have an analog to these vector data types in CUDA Fortran.


- Mat
Back to top
View user's profile
xray



Joined: 21 Jan 2010
Posts: 85

PostPosted: Tue Apr 12, 2011 11:39 pm    Post subject: Reply with quote

Hi,
Thanks for the information. Two more comments from my side. We used the float4 type with CUDA C and got better performance on several NVIDIA GPUs (although not Fermi, I believe). I think, float4 types are aligned which could give a better performance in some cases. If I use Fortran's array operations instead, it should be almost the same, but what is about this alginment in Fortran?
Thus, I think, there could be differences.
Bye, Sandra
Back to top
View user's profile
tty103



Joined: 19 Oct 2009
Posts: 8

PostPosted: Wed Apr 13, 2011 1:10 pm    Post subject: Reply with quote

I would guess nvcc could put all four elements in one segment continuously if four threads will access the four elements. If one thread will access all four element, nvcc could put the four elements in the same location but in four memory segments. In fortran, user has to do the memory management.
Say if there are 10 ponts, we could do a my_pnt(1:10)%xyzw(1:4) style
or x(1:10), y(1:10), z(1:10) and w(1:10) style, depending on how you plan to access them.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group