PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Problem with CUDA fortran simple program

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Jony



Joined: 05 Feb 2010
Posts: 3

PostPosted: Sun Feb 07, 2010 3:13 am    Post subject: Problem with CUDA fortran simple program Reply with quote

Hi, I am a beginneer of CUDA fortran and I am testing the following program. The code is compiled as pgf95 -ta=nvidia sumAB.cuf and it runs but gives me the wrong results. Any suggestion? Thanks,

!----------------module for sumAB--------------------------
module m_sumAB

use cudafor

contains

!-------------kernel subroutine-----------------
attributes(global) subroutine k_sumAB(n,A,B,C)

integer :: i
integer, value :: n

real, dimension (n) :: A,B,C

i=(blockidx%x-1)*blockdim%x+threadidx%x
if (i<=n) C(i)=A(i)+B(i)

end subroutine k_sumAB

!-------------host subrotuine--------------------
subroutine h_sumAB(n,bdim,A,B,C)
implicit none
integer :: n,bdim
real, dimension (n) :: A,B,C
real, device, dimension (n) :: Adev,Bdev,Cdev
Adev=A
Bdev=B
call k_sumAB<<<n/bdim, bdim>>>(n,Adev,Bdev,Cdev)
C=Cdev

end subroutine h_sumAB

end module m_sumAB
!---------------------------end module----------------------



program sumAB
!----------------------------------------------------
!
!purpose: sum two vector A and B of n-elements
!
!----------------------------------------------------
use m_sumAB

integer i
integer :: n=1000
integer :: bdim=100

real :: times,timef,sum
real, dimension (n) :: A,B,C,D
!-----------------end declaration variable-----------


!Initialzation arrays
A=1.2
B=2.2
C=0.
D=0.
E=0.

!CPU calculation
call cpu_time(times)
do i=1,n
D(i)=A(i)+B(i)
end do
call cpu_time(timef)

print *,'CPU time required is: ',timef-times,' seconds'


!GPU calculation
call cpu_time(times)
call h_sumAB(n,bdim,A,B,C)
call cpu_time(timef)
print *,'GPU time required is: ',timef-times,' seconds'


!diff between results
sum=0.
do i=1,n
sum=sum+C(i)-D(i)
end do

print *,'Difference between results is: ',sum,C(1),D(1)


pause

end program sumAB
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Mon Feb 08, 2010 12:07 pm    Post subject: Reply with quote

Hi Jony,

I'm not sure. The program seems get correct answers when I run it.
Code:
% pgf95 sumAB.cuf  -o sumAB.out
% sumAB.out
 CPU time required is:    7.1525574E-06  seconds
 GPU time required is:    8.8712931E-02  seconds
 Difference between results is:     0.000000        3.400000
    3.400000
FORTRAN PAUSE: enter <return> or <ctrl>d to continue> 


(Note that "-ta=nvidia" is for the Accelerator directive based model so has no effect on your code).

Can you please post more information including a sample of the output, which compiler version you're using, and which GPU you have.

Thanks,
Mat
Back to top
View user's profile
Jony



Joined: 05 Feb 2010
Posts: 3

PostPosted: Tue Feb 09, 2010 3:03 am    Post subject: Reply with quote

I Mat, thanks a lot for replying. I get the following answer:

Code:
% pgf95 sumAB.cuf  -o sumAB.out
% sumAB.out
 CPU time required is:    0.000000         seconds
 GPU time required is:   0.2650000         seconds
 Difference between results is:                NaN    -4.2451527E+37
    3.400000
FORTRAN PAUSE: continuing...


I have downloaded and installed the PGI Workstation complete package, release 10.2, 32 bit for Windows. I have Windows Xp and my processor is a Centrino dual core. About the GPU information, I run the "cufinfo" program provided by PGI and get the following answer:

Code:
Device Number: 0
Device Name: GeForce 9200M GE
Total Global Memory: 0.268 Gbytes
sharedMemPerBlock: 16384 bytes
regsPerBlock: 8192
warpSize: 32
maxThreadsPerBlock: 512
maxThreadsDim: 512 x 512 x 64
maxGridSize: 65535 x 65535 x 1
ClockRate: 1.300 GHz
Total Const Memory: 65536 bytes
Compute Capability Revision: 1.1
TextureAlignment: 256 bytes
deviceOverlap: F
multiProcessorCount: 1
integrated: F
canMapHostMemory: F
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Tue Feb 09, 2010 9:42 am    Post subject: Reply with quote

Hi Jony,

Try using the flag "-Mcuda=cc11" to tell the compiler that your device is compute capable 1.1. By default the compiler targets cc 1.3. If the works create a "$PGI/win32/10.x/bin/sitenvrc" file (replace 'x' with the actual release number) with the following line to make cc 1.1 the default.
Quote:
set COMPUTECAP=1.1;


- Mat
Back to top
View user's profile
Jony



Joined: 05 Feb 2010
Posts: 3

PostPosted: Thu Feb 11, 2010 5:29 am    Post subject: Reply with quote

Thanks a lot Mat, that's was the problem! Now it works fine :-)

Jony
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group