Joined: 30 Jun 2004
Location: The Portland Group Inc.
|Posted: Mon Apr 16, 2012 9:34 am Post subject:
My comment about indices is for C to Fortran conversion. Without it, you'll be off by one. You just need to either add one to the return index, or declare your Fortran arrays with a zero lower bound.
|Well, about zero-based indexes, I'm using array unrolling to avoid designing functions for various |
Either method will work so long as the threads in your blocks are accessing memory as contiguous blocks. Personally, I prefer using the explicit dimensions since it's easier to keep track of the indexing. Fortran is column-major, meaning that the contiguous memory segment is the first dimension. So if the "i" index is your first column, you should be striding across this dimension and not "k". This is the opposite as C which is row-major.
|You'll see that this is sort of a ugly code, because every time we need to access variables "i, j, k" to get the needed value. "idx" here looks more compact, faster |
Note, your program has a major issue which is preventing the kernels from launching. Take a look at your launch configuration. Can you tell what's wrong? Also, especially during development, adding error checking after your kernel launches to ensure your kernels are executing.
call calculate3D<<<blocks>>>(DIM3_dev3, DIM3_dev2, DIM3_dev1)
istat = cudaGetLastError()
if (istat .gt. 0) then
print *, cudaGetErrorString(istat)
counter = counter+1