PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

CUDA "philosophy"

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
cablesb



Joined: 21 Jan 2010
Posts: 33

PostPosted: Mon May 21, 2012 5:00 pm    Post subject: CUDA "philosophy" Reply with quote

FIrst, thanks for the help on my last thread re: my coding of the diffusion equation. I have the code going now, but would like to get some "general edification." My kernel for the diffusion equation is:
Code:

attributes(global) subroutine diff_time_stepper(v,diffconst)


real*8 :: v(:,:)
real*8 :: diffconst

real*8 :: vintermed
integer :: i,j,m
integer :: nx, ny

nx=256
ny=256

i=(blockIdx%x-1)*blockDim%x+threadIdx%x
j=(blockIdx%y-1)*blockDim%y+threadIdx%y

if (i<nx .and. j<ny>1 .and. j>1) then
  vintermed=v(i,j)+diffconst*(v(i-1,j)-2.*v(i,j)+v(i+1,j)+v(i,j-1)-2.*v(i,j)+v(i,j+1))
  v(i,j)=vintermed
! add a source for the heck of it
  if (i==64 .and. j==64) v(i,j)=v(i,j)+1
endif


end subroutine


My question is: When I add 1 to v(64,64), am I risking non-deterministic behavior? It seems arguable to me that the thread that handles (64,64) might, for instance complete this routine early sometimes and affect the calculation of the thread that handles, for instance, (63,64). And then, sometimes, it might not. But the code seems to be working. As I have thought about this, I have gotten confused about just what might cause non-deterministic behavior. Maybe my first two calculations, where vinterm gets defined and then put back in v might be problematic? Anyone care to enlighten me or point me to a good discussion on this topic? Thanks.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Tue May 22, 2012 9:19 am    Post subject: Reply with quote

Quote:
When I add 1 to v(64,64), am I risking non-deterministic behavior?
Yes.

Quote:
It seems arguable to me that the thread that handles (64,64) might, for instance complete this routine early sometimes and affect the calculation of the thread that handles, for instance, (63,64). And then, sometimes, it might not. But the code seems to be working.
Correct.

Quote:
Maybe my first two calculations, where vinterm gets defined and then put back in v might be problematic?
Yes, this is problematic.

Quote:
Anyone care to enlighten me or point me to a good discussion on this topic?


Do a web serach for "loop carried dependence cuda" and you'll find a lot of discussions. In particular, search Google Books and you'll find section 4.5, from "Computer Architecture: A Quantitative Approach" by John L. Hennessy and David A. Patterson which gives a good overview of loop dependencies.

Note that the best way to fix this code is to break this code into two separate kernels (or a kernel and device to device copy) since global synchronization is required between the calculation of "vintermed" and storing the result back to "v".

- Mat
Back to top
View user's profile
cablesb



Joined: 21 Jan 2010
Posts: 33

PostPosted: Tue May 22, 2012 9:23 am    Post subject: Reply with quote

OK, I feel like I have gone from not knowing what I don't know to knowing what I don't know. And if that's confusing, think how I feel. Actually, I think this is progress. And thanks for the reference. Much obliged!
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group