PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

how to compile a !$acc program?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
l4linux



Joined: 03 Jun 2006
Posts: 6

PostPosted: Sun Jul 12, 2009 2:30 am    Post subject: how to compile a !$acc program? Reply with quote

I read the manual of pgi 9.0.1 and write a sample program,

! include 'accel_lib.h'
program main
use accel_lib
implicit none
integer :: i
integer,parameter :: N=100000000
real :: x=0.0
!$acc region do parallel(8), private(x,i)
do i=1,N
x=x+i
enddo
print *, x
end program main

then
[~]$ pgf95 1.f90 -ta=nvidia
ptxas /tmp/pgaccJTdfPuFB79TC.ptx, line 81; error : State space incorrect for instruction 'st'
ptxas fatal : Ptx assembly aborted due to errors
PGF90-F-0000-Internal compiler error. pgnvd job exited with nonzero status code 0 (1.f90: 13)

what's wrong ?

thanks !
Back to top
View user's profile
l4linux



Joined: 03 Jun 2006
Posts: 6

PostPosted: Sun Jul 12, 2009 2:32 am    Post subject: my GPU Reply with quote

[~]$ pgaccelinfo
Device Number: 0
Device Name: GeForce 8400 SE
Device Revision Number: 1.1
Global Memory Size: 267714560
Number of Multiprocessors: 1
Number of Cores: 8
Concurrent Copy and Execution: No
Total Constant Memory: 65536
Total Shared Memory per Block: 16384
Registers per Block: 8192
Warp Size: 32
Maximum Threads per Block: 8192
Maximum Block Dimensions: 512 x 512 x 64
Maximum Grid Dimensions: 65535 x 65535 x 1
Maximum Memory Pitch: 262144B
Texture Alignment 256B
Clock Rate: 918 MHz
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5871
Location: The Portland Group Inc.

PostPosted: Mon Jul 13, 2009 2:55 pm    Post subject: Reply with quote

Hi l4linux,

The basic problem here is that your code is not parallelizable. If we remove the "parallel(8)" clause from the "!$acc region do" directive, the compiler correctly detects that the code is not parallel and won't generate a GPU kernel.

Code:
pgf90 test.f90 -ta=nvidia -Minfo=accel
main:
      9, No parallel kernels found, accelerator region ignored
     10, Scalar last value needed after loop for x
         Loop carried scalar dependence for x
     11, Accelerator restriction: scalar variable live-out from loop: x


However, when you use the "parallel" clause, you are telling the compiler to go ahead and parallelize the code anyway. Unfortunately, this leads to some nonsensical PTX code and the error by ptxas.

To fix, promote x to an array and then do the reduction on the host. Note that we will support reductions on the GPU in the future, but this support is not available in the 9.0 release.
Code:
 cat test.f90
! include 'accel_lib.h'
program main
use accel_lib
implicit none
integer :: i
integer,parameter :: N=1000000
real :: x=0.0
real :: xarr(N)
!$acc region do
do i=1,N
   xarr(i)=i
enddo

do i=1,N
  x=x+xarr(i)
end do
print *, x
end program main


Note that the directory "$PGI/linux86-64/9.0-1/etc/samples" contains several accelerator examples which might be helpful.

- Mat
Back to top
View user's profile
l4linux



Joined: 03 Jun 2006
Posts: 6

PostPosted: Tue Jul 14, 2009 11:38 am    Post subject: Reply with quote

thank you very much !
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group