PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Linear Layout of threads in Fortran

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Sun Nov 07, 2010 3:25 pm    Post subject: Linear Layout of threads in Fortran Reply with quote

These are dummy questions, yet I want to confirm.
In CUDA C, the threads are linearly organized in a way that threadIdx.x increase fastest, then threadIdx.y, and finally threadIdx.z.
Is this the same in CUDA Fortran?

Another question is using cudaMalloc(),the data is guaranteed to be aligned; is this the same with using allocate() ?

Then, using such runtime APIs in Fortran, the data is organized in column-based or row-based like in CUDA C ?

Thanks,
Tuan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Tue Nov 09, 2010 5:05 pm    Post subject: Reply with quote

Hi Tuan,

Quote:
In CUDA C, the threads are linearly organized in a way that threadIdx.x increase fastest, then threadIdx.y, and finally threadIdx.z.
Is this the same in CUDA Fortran?
CUDA Fortran matches CUDA C's behavior.

Quote:
Another question is using cudaMalloc(),the data is guaranteed to be aligned; is this the same with using allocate() ?
Ultimately, allocate calls cudaMalloc, so it should be the same.
Quote:

Then, using such runtime APIs in Fortran, the data is organized in column-based or row-based like in CUDA C ?
You'll need to clarify this one. Which API calls are you do you want to use? The only ones that seem applicable are the 2D and 3D calls. There, the 'ptich' will be the number of columns in the array, the width be the number of columns, and the height is the number or rows.

- Mat
Back to top
View user's profile
Tuan



Joined: 11 Jun 2009
Posts: 233

PostPosted: Wed Nov 10, 2010 1:18 pm    Post subject: Reply with quote

mkcolg wrote:
Hi Tuan,

Quote:

Then, using such runtime APIs in Fortran, the data is organized in column-based or row-based like in CUDA C ?
You'll need to clarify this one. Which API calls are you do you want to use? The only ones that seem applicable are the 2D and 3D calls. There, the 'ptich' will be the number of columns in the array, the width be the number of columns, and the height is the number or rows.

- Mat

I believe that a call to allocate(data(M,N)) in Fortran with 'data' on host memory will set up data in column-major.

However, as you made clear that allocate(d_data(M,N)) with d_data on device memory will ultimately call cudaMalloc(). cudaMalloc() in CUDA C uses row-major, yet I'm not sure with CUDA Fortran. So, in this case will d_data be organized on device memory as column-major or row-major?

Thanks Mat.

Tuan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Wed Nov 10, 2010 10:00 pm    Post subject: Reply with quote

Hi Tuan,

Your thinking too hard on this one. cudaMalloc (and malloc on the host) just returns a block of contiguous memory. It doesn't organize the memory. How the memory is accessed is just a conceptional artifact of each language.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group