|
| View previous topic :: View next topic |
| Author |
Message |
SarahA
Joined: 29 Aug 2006 Posts: 16
|
Posted: Tue Mar 16, 2010 9:58 am Post subject: Strong typing and memory copy |
|
|
Hello,
I have a fortran 90 code to try with CUDA Fortran (2.3). It has a lot of array aliasing, where they thought it would be nice to refer to double complex 5d arrays sometimes with double reduced-dimension arrays. This was done with argument aliasing, for example:
real array(18, N, 2, 4)
call subr( array )
...
subroutine subr( array )
complex array(3,3,N,2,4)
This doesn't work with fortran modules, and that seems to be the best way to write sets of CUDA device arrays and routines that use the arrays. There is much whining about formal parameter mismatches, etc.
It doesn't help that EQUIVALENCE is documented as unsupported for CUDA 2.3 ... it does compile without error, but it doesn't seem to work. Of course, I could have another error.
So, my long winded questions are
1) Is there a way to relax type conformance on array copies? ( host = device, etc ).
2) I have tried using the cudaMemcpy() routine, but it fails with errors like:
copyin Memcpy FAILED:4 Perhaps it is because these arrays are multi-dimensional.
3) The runtime routines are not real well documented. Is there a somewhere more details are provided on things like "cudaMemcpyToArray"? For instance, what are the dstx, dsty parameters?
4) Is device array equivalence really supposed to work? Section 3.2.1 of the v1.2 CUDA Fortran guide says no.
Thanks,
Sarah |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Tue Mar 16, 2010 3:53 pm Post subject: |
|
|
Hi Sarah,
| Quote: | | 1) Is there a way to relax type conformance on array copies? ( host = device, etc ). | Sorry, no. Though, this is not specifically a CUDA Fortran issue. Rather, you would have the same problem with any module.
The work around is to write your CUDA Fortran module using complex, and have "subr" call the kernel. Something like:
| Code: |
module bar
contains
attributes(global) subroutine foo_kernel(x)
complex, device, dimension(*) :: x
i = threadidx%x
x(i) = cmplx(i*2.0-1.0,i*2.0)
return
end
end module bar
subroutine foo(x)
use bar
complex, device, dimension(*) :: x
call foo_kernel<<<1,50>>> (x)
end subroutine foo
program test
use cudafor
interface
subroutine foo(x)
real, device, dimension(*) :: x
end subroutine
end interface
real, allocatable, device, dimension(:) :: a
real ha(100)
allocate(a(100))
ha=-1
a = 0.0
call foo(a)
ha = a
print *,ha(1),ha(2),ha(99),ha(100)
end |
| Quote: | | It doesn't help that EQUIVALENCE is documented as unsupported for CUDA 2.3 ... it does compile without error, but it doesn't seem to work. Of course, I could have another error. | The compiler should catch this and give an error if EQUIVALENCE is used. I've submitted a problem report (TPR#16726) to have this fixed.
Hope this helps,
Mat |
|
| Back to top |
|
 |
SarahA
Joined: 29 Aug 2006 Posts: 16
|
Posted: Thu Mar 18, 2010 11:33 am Post subject: CUDA Fortran runtime library documentation |
|
|
Thanks for the idea.
Any pointers on documentation for the CUDA Fortran run time memory copy routines? The manual, even the newest one, is a little thin there. I'll try matching them up with the CUDA C equivalents to decode the "dstx" and "dsty" arguments and so on.
Perhaps a runtime call to do copies would solve my problem. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Mar 18, 2010 12:50 pm Post subject: |
|
|
Hi SarahA,
| Quote: | | Any pointers on documentation for the CUDA Fortran run time memory copy routines? The manual, even the newest one, is a little thin there. I'll try matching them up with the CUDA C equivalents to decode the "dstx" and "dsty" arguments and so on. | Yes, the most frequent complaint we get from users is the lack of documentation and examples. It will get better over time.
The other complaint we get is that when we do get questions liek this, we tend to send users to the NVIDIA CUDA C documentation. Which is what I'll need to do here (sorry). The Fortran "cuda" routines are just calls to the CUDA C versions so the NVIDIA documentation should work. http://developer.download.nvidia.com/compute/cuda/3_0-Beta1/toolkit/docs/online/group__CUDART__MEMORY.html
Note that textured memory and cuda Arrays aren't yet supported. So the routines that uses them are only there for CUDA C compatibility.
- Mat |
|
| Back to top |
|
 |
SarahA
Joined: 29 Aug 2006 Posts: 16
|
Posted: Fri Mar 19, 2010 10:43 am Post subject: |
|
|
| mkcolg wrote: | Hi SarahA,
The other complaint we get is that when we do get questions liek this, we tend to send users to the NVIDIA CUDA C documentation. Which is what I'll need to do here (sorry). The Fortran "cuda" routines are just calls to the CUDA C versions so the NVIDIA documentation should work. http://developer.download.nvidia.com/compute/cuda/3_0-Beta1/toolkit/docs/online/group__CUDART__MEMORY.html
Note that textured memory and cuda Arrays aren't yet supported. So the routines that uses them are only there for CUDA C compatibility.
- Mat |
Well then, at the risk of my questions getting dumber and dumber...
How does one go about specifying and calling CUDA C kernels from a CUDA Fortran program?
Since there is no linker ... does one specify the source.cu for 'pgfortran' and it just works?
Yes, a lot of these questions will go away with a couple concrete examples being provided.
Sarah |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|