|
| View previous topic :: View next topic |
| Author |
Message |
l4linux
Joined: 03 Jun 2006 Posts: 6
|
Posted: Sun Mar 28, 2010 3:23 am Post subject: help me with my first CUDA Fortran program. |
|
|
I download pgi workstation complete 10.3 Linux x86_64 , read the pdf files and wrote my first cuda fortran program , but I failed to compile it.
thanks a lot.
module cacmu
use cudafor
contains
attributes(global) subroutine cac(n,x)
implicit none
integer :: n
real :: x
integer :: i
x=0.0
do i=1,N
x=x+real(i)
enddo
end subroutine cac
end module cacmu
program main
use cudafor
use cacmu
implicit none
integer :: n_=1000000*64
real :: x_
call cac<<<n_/64,64>>>(n_,x_)
print *,x_
end program main
then I compiled it as following
$ pgf95 1.cuf
PGF90-S-0188-Argument number 1 to cac: type mismatch (1.cuf: 22)
PGF90-S-0188-Argument number 2 to cac: type mismatch (1.cuf: 22)
0 inform, 0 warnings, 2 severes, 0 fatal for main
Last edited by l4linux on Sun Mar 28, 2010 3:46 am; edited 4 times in total |
|
| Back to top |
|
 |
l4linux
Joined: 03 Jun 2006 Posts: 6
|
Posted: Sun Mar 28, 2010 3:27 am Post subject: |
|
|
My GPU card is a Gigabyte GT240 1GB DDR5 which should support cuda sm1.2 .
[root]# pgaccelinfo
CUDA Driver Version 3000
Device Number: 0
Device Name: GeForce GT 240
Device Revision Number: 1.2
Global Memory Size: 1073020928
Number of Multiprocessors: 12
Number of Cores: 96
Concurrent Copy and Execution: Yes
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 16384
Warp Size: 32
Maximum Threads per Block: 512
Maximum Block Dimensions: 512, 512, 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 2147483647B
Texture Alignment 256B
Clock Rate: 1462 MHz
Initialization time: 14333 microseconds
Current free memory 1020030976
Upload time (4MB) 6511 microseconds (1371 ms pinned)
Download time 3893 microseconds ( 961 ms pinned)
Upload bandwidth 644 MB/sec (3059 MB/sec pinned)
Download bandwidth 1077 MB/sec (4364 MB/sec pinned) |
|
| Back to top |
|
 |
BeachHut
Joined: 14 Mar 2010 Posts: 11
|
Posted: Mon Mar 29, 2010 6:15 am Post subject: |
|
|
| You have several problems with your code and compilation. The two that will prevent you from compiling are that in your kernel you should have integer, value :: n, real, value :: x. You must also compile with -Mcuda. Another problem with your code is that I don't think you understand how GPU programming works. You have no references to threads. Perhaps you are thinking of Accelerator instead (#pragma acc). |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Mar 29, 2010 5:23 pm Post subject: |
|
|
Hi l4linux,
In CUDA, every thread will execute the same kernel. In your example, you have all threads sequentially summing a value. As BeachHut suggests, you could get this code running, but I doubt it would be very fast nor what you intended.
While you can perform a sum reduction in parallel (I touch upon it in my last PGI Insider Article), it's rather difficult. Instead, you should consider starting with a simple Matmul program (See: http://www.pgroup.com/lit/articles/insider/v1n3a2.htm)
Hopefully, this will get you started. If not, please let us know and we'll try to help further.
- Mat |
|
| Back to top |
|
 |
l4linux
Joined: 03 Jun 2006 Posts: 6
|
Posted: Wed Mar 31, 2010 8:44 am Post subject: |
|
|
| mkcolg wrote: | Hi l4linux,
In CUDA, every thread will execute the same kernel. In your example, you have all threads sequentially summing a value. As BeachHut suggests, you could get this code running, but I doubt it would be very fast nor what you intended.
|
Oh, Yes , I'm so stupid.
Thank you all, for your messages. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|