PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

thread-local variables

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
tty103



Joined: 19 Oct 2009
Posts: 8

PostPosted: Sun Apr 03, 2011 3:37 pm    Post subject: thread-local variables Reply with quote

I have a kernel subroutine like this:

Code:

attributes(global) subroutine FrontSweep_cuda       

 integer,value :: d, s, w, n_ways
 integer,value :: i,j
 integer,value ::  fmi, fmj, fmk, fm_id
 real,value :: in_bound(3), out_bound(3), lax3_factor(3)
 real, value ::  bot, top, avg, IsZero

 do s=3, ncap_gpu

 n_ways=ini_stage(s+1)-ini_stage(s)

 do w=blockidx%x, n_ways, griddim%x
 do d=threadidx%x, num_dir_gpu, blockdim%x
 
     if(o_pnt_gpu(1) .lt. 0) then
        fmi=cap_gpu(1)- x_draw( ini_stage(s)+w-1 )+1
     else
        fmi=x_draw( ini_stage(s)+w-1 )
     endif
   
     if(o_pnt_gpu(2) .lt. 0) then
        fmj=cap_gpu(2)- y_draw( ini_stage(s)+w-1 )+1
     else
        fmj=y_draw( ini_stage(s)+w-1 )
     endif
       
     if(o_pnt_gpu(3) .lt. 0) then
        fmk=cap_gpu(3)- z_draw( ini_stage(s)+w-1 )+1
     else
        fmk=z_draw( ini_stage(s)+w-1 )
     endif
       
     in_bound(1)=iflux(fmj,fmk,d)
     in_bound(2)=jflux(fmi,fmk,d)
     in_bound(3)=kflux(fmi,fmj,d)
 
     fm_id=m_matrix_gpu(fmi,fmj,fmk)
     IsZero=0
     lax3_factor=1
     bot=sigt_gpu(fm_id)
     top=asrcflx_gpu(fmi,fmj,fmk,d)
     do i=1,3
       bot=bot+2*cos_dager_gpu(i,d)
       top=top+2*cos_dager_gpu(i,d)*in_bound(i)
     enddo

  do while (IsZero .eq. 0)
   
    avg=top/bot
    IsZero=1
   
    do i=1,3
      if(lax3_factor(i) .eq. 0) cycle
   
      out=2*avg-in_bound(i)
   
      if (out .lt. 0)  then
        out_bound(i)=0.0
        lax3_factor(i)=0
        top=top-cos_dager_gpu(i,d)*in_bound(i)
        bot=bot-2*cos_dager_gpu(i,d)
        IsZero =0
        exit
      else
        out_bound(i)=out
      endif
   enddo
  enddo !do while

  iflux(fmj,fmk,d)=out_bound(1)
  jflux(fmi,fmk,d)=out_bound(2)
  kflux(fmi,fmj,d)=out_bound(3)

  asrcflx_gpu(fmi,fmj,fmk,d)=avg

 enddo !dir
 
 enddo !ways
 
 call syncthreads()
 enddo !stage
 
 return
end subroutine



all the variables I defined in the beginning are intended to be local to a thread, which means each thread will see different values of these variables.
I wonder if the complier can recognize them as thread-local, including in_bound(3), out_bound(3), and lax3_factor(3) ?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6211
Location: The Portland Group Inc.

PostPosted: Mon Apr 04, 2011 9:00 am    Post subject: Reply with quote

Hi tty103,

Quote:
I wonder if the complier can recognize them as thread-local, including in_bound(3), out_bound(3), and lax3_factor(3) ?
All local variables are local to each thread so these three variables do not share storage across multiple threads. For shared storage within a block of threads, you would need to explicitly declare the variables with the 'shared' attribute.

- Mat
Back to top
View user's profile
tty103



Joined: 19 Oct 2009
Posts: 8

PostPosted: Mon Apr 04, 2011 9:58 am    Post subject: Reply with quote

thanks. another question, sorry if this is asked before
Code:

module mCuda
real, allocatable, device :: MyArray(:,:,:)

attribute(global) subroutine test_cuda

integer i,j,k

do i=threadidx%x , 100, blockdim%x
do j=threadidx%y , 100, blockdim%y
do k=threadidx%z,  100, blockdim%z

  MyArray(i,j,k)=1

enddo
enddo
enddo

end subroutine
end module


i, j, k are local to thread, I wonder how the complier knows MyArray is not, which means I only need to allocate one copy of MyArray, not one for every thread.

if I pull the i, j, k out of the subroutine
Code:

module mCuda
real, allocatable, device :: MyArray(:,:,:)
integer i, j, k

attribute(global) subroutine test_cuda

do i=threadidx%x , 100, blockdim%x
do j=threadidx%y , 100, blockdim%y
do k=threadidx%z,  100, blockdim%z

  MyArray(i,j,k)=1

enddo
enddo
enddo

end subroutine
end module


does the compile still know i,j,k are local?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6211
Location: The Portland Group Inc.

PostPosted: Mon Apr 04, 2011 10:12 am    Post subject: Reply with quote

Quote:
i, j, k are local to thread, I wonder how the complier knows MyArray is not, which means I only need to allocate one copy of MyArray, not one for every thread.
Because MyArray has module scope hence is visible to all routines within the module.

Quote:
does the compile still know i,j,k are local?
But they aren't local any longer. By moving them to the module data section, they are given module scope and hence accessible by all threads.

Also, they are host variables (no device attribute) so you'll have problems access them on the device.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group