PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Coalesced copy, strong typing, and equivalence

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
TroelsH



Joined: 24 Mar 2010
Posts: 9

PostPosted: Fri Jan 28, 2011 8:27 am    Post subject: Coalesced copy, strong typing, and equivalence Reply with quote

I have a large structure of "particles". Each particle is 48 bytes long :

Code:
  type :: particle
    real r(3)                                             ! fractional positions
    real e                                                ! energy
    real p(3)                                             ! momenta
    real w                                                ! weight
    integer(kind=2) q(3)                                  ! integer positions
    integer(kind=2) bits                                  ! extra bits
    integer*8 i                                           ! id
  end type
  type(particle), dimension(np_total) :: gp               ! global particle array
  type(particle), dimension(np_stick) :: gs               ! stick particle array
  integer, dimension(nx,ny,nz) :: gi

and I can easily have ~4GB of particles on my C1060 . The particle array is sorted and organized in cells. I have an index array gi(nx,ny,nz) that tells at which index the particles in each cell sits.

I need to copy out a z-stick of particles from the GPU to the CPU. That is, all particles in cells with a certain (jx,jy) coordinate. To minimize Host-GPU data transfers I have a routine which selects the particles sitting in cells with a coordinate of (jx,jy) in the global array, gp, and copy them over to a continuous array gs. That array can then be transfered to the CPU.

Apart from the index-juggle what I want to do is to transfer particles from gp to gs in coalesced transfers.

Right now I have

Code:
jz = ... ! index in z-column
np = ... ! nr of particles in the (jx,jy,jz)-cell
offp = ... ! offset in gp array for cell (jx,jy,jz)
offs = ... ! offset in gs array to copy to
it = threadidx%x
if (it <= np) then
  ip = threadidx%x + offp ! index in global array of particle it in cell (jx,jy,jz)
  is = threadidx%x + offs ! index in stick array
  gs(is) = gp(ip)
endif


Essentially each thread copies one particle. That means each thread will transfer 48 bytes which is terrible for coalescing.

My question is how to do it in such a way that each thread tranfers 4-byte blocks, and the Cuda hardware is kept happy ?

I could only think of two ways, which unfortunately are not supported with the current standard.

1)

If routines were not strongly typed, I would be much better of. Then instead of

type(particle), dimension(np_total) :: gp ! global particle array
type(particle), dimension(np_stick) :: gs ! stick particle array

I could do

integer, dimension(np_total*12) :: gp ! global particle array
integer, dimension(np_stick*12) :: gs ! stick particle array

and copy away.

2)

Alternatively with equivalence in my variable declaration I could do something like:

type(particle), dimension(np_total) :: gp ! global particle array
type(particle), dimension(np_stick) :: gs ! stick particle array
integer, dimension(np_total*12) :: igp ! global particle array
integer, dimension(np_stick*12) :: igs ! stick particle array
equivalence :: igp, gp
equivalence :: igs, gs

and the problem was solved too. In general, for memory transfers the implementation of equivalence would help a lot.

Any suggestions how to accomplish coalesced transfers on type'd variables ?

thanks in advance,

Troels
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon Jan 31, 2011 12:51 pm    Post subject: Reply with quote

Hi Troels,

Unfortunately, I can't think of anything better. If I understand correctly, gp is scattered (i.e the offp is not sequential to the thread ids) so nothing can be done to help. Maybe you could do something with gs's store, but I'm not sure it would be worth it.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group