PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

OpenACC directive "acc parallel"

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Minh Duc Nguyen



Joined: 15 Apr 2010
Posts: 7

PostPosted: Wed Jun 27, 2012 6:47 am    Post subject: OpenACC directive "acc parallel" Reply with quote

Hello,
I've installed the new compiler version 12.5.
I have a standard Jacobi iterative method program.
I use the OpenACC directive set.
The program freezes at every run. If I change the "acc parallel" directive to "acc region", everything is ok.
Here is the source code of Jacobi using "acc parallel" directive:

Code:
! Jacobi 9-point stencil operation, simplest case
!
subroutine jacobi( a, newa, n, m, w0, w1, w2, tolerance, change, iters )
 real :: w0, w1, w2, tolerance
 integer :: n, m
 real, dimension(:,:) :: a, newa
 real, intent(out) :: change
 integer, intent(out) :: iters

 integer :: i,j

 change = tolerance + 1 ! get into the while loop

 iters = 0
 do while ( change > tolerance )
  iters = iters + 1
  change = 0
!$acc parallel
    do j = 2, n-1
      do i = 2, m-1
        newa(i,j) = w0 * a(i,j) + &
        w1 * (a(i-1,j) + a(i,j-1) + a(i+1,j) + a(i,j+1) ) + &
        w2 * (a(i-1,j-1) + a(i-1,j+1) + a(i+1,j-1) + a(i+1,j+1) )
        change = max( change, abs( newa(i,j) - a(i,j) ) )
      enddo
    enddo
    a(2:m-1,2:n-1) = newa(2:m-1,2:n-1)
!$acc end parallel
 enddo
end subroutine


Code:
pgfortran -acc -Minfo=all  -c J1.f90 -Minfo=accel
NOTE: your trial license will expire in 12 days, 6.32 hours.
NOTE: your trial license will expire in 12 days, 6.32 hours.
jacobi:
     18, Accelerator kernel generated
         19, CC 1.0 : 17 registers; 112 shared, 28 constant, 0 local memory bytes
             CC 2.0 : 22 registers; 0 shared, 132 constant, 0 local memory bytes
         20, !$acc loop vector(256) ! threadidx%x
         24, Max reduction generated for change
         27, !$acc loop vector(256) ! threadidx%x
     18, Generating copyout(newa(2:m-1,2:n-1))
         Generating copyin(a(:m,:n))
         Generating copyout(a(2:m-1,2:n-1))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     19, Loop is parallelizable
     20, Loop is parallelizable
     27, Loop is parallelizable
pgfortran -o J1.exe -acc -Minfo=all  Jmain.o J1.o


This version freezes every time I try to run it

Here is the same source code using "acc region" directive
Code:


! Jacobi 9-point stencil operation, simplest case
!
subroutine jacobi( a, newa, n, m, w0, w1, w2, tolerance, change, iters )
 real :: w0, w1, w2, tolerance
 integer :: n, m
 real, dimension(:,:) :: a, newa
 real, intent(out) :: change
 integer, intent(out) :: iters

 integer :: i,j

 change = tolerance + 1 ! get into the while loop

 iters = 0
 do while ( change > tolerance )
  iters = iters + 1
  change = 0
!$acc region
    do j = 2, n-1
      do i = 2, m-1
        newa(i,j) = w0 * a(i,j) + &
        w1 * (a(i-1,j) + a(i,j-1) + a(i+1,j) + a(i,j+1) ) + &
        w2 * (a(i-1,j-1) + a(i-1,j+1) + a(i+1,j-1) + a(i+1,j+1) )
        change = max( change, abs( newa(i,j) - a(i,j) ) )
      enddo
    enddo
    a(2:m-1,2:n-1) = newa(2:m-1,2:n-1)
!$acc end region
 enddo
end subroutine

Code:


pgfortran -acc -Minfo=all  -c J1.f90 -Minfo=accel
NOTE: your trial license will expire in 12 days, 6.27 hours.
NOTE: your trial license will expire in 12 days, 6.27 hours.
jacobi:
     18, Generating copyout(newa(2:m-1,2:n-1))
         Generating copyin(a(:m,:n))
         Generating copyout(a(2:m-1,2:n-1))
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
     19, Loop is parallelizable
     20, Loop is parallelizable
         Accelerator kernel generated
         19, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
         20, !$acc do parallel, vector(16) ! blockidx%x threadidx%x
             CC 1.0 : 16 registers; 112 shared, 36 constant, 0 local memory bytes
             CC 2.0 : 22 registers; 16 shared, 120 constant, 0 local memory bytes
         24, Max reduction generated for change
     27, Loop is parallelizable
         Accelerator kernel generated
         27, !$acc do parallel, vector(16) ! blockidx%y threadidx%y
             !$acc do parallel, vector(16) ! blockidx%x threadidx%x
             CC 1.0 : 8 registers; 80 shared, 12 constant, 0 local memory bytes
             CC 2.0 : 10 registers; 16 shared, 96 constant, 0 local memory bytes
pgfortran -o J1.exe -acc -Minfo=all  Jmain.o J1.o

This version runs without problems

PGI compiler version 12.3 correctly compile the "acc parallel" directive.

Question: What's wrong with the new version 12.5?
Back to top
View user's profile
Minh Duc Nguyen



Joined: 15 Apr 2010
Posts: 7

PostPosted: Wed Jun 27, 2012 1:35 pm    Post subject: Reply with quote

I forgot to post the main program of the Jacobi method
Code:


! main routine to call any of the accelerator model Jacobi routines
!
program main
 interface
  subroutine jacobi( a, newa, n, m, w0, w1, w2, tolerance, change, iters )
   real :: w0, w1, w2, tolerance
   integer :: n, m
   real, dimension(:,:) :: a, newa
   real, intent(out) :: change
   integer, intent(out) :: iters
  end subroutine
 end interface

 integer nargs
 integer n, m
 character*10 arg
 real, allocatable :: a(:,:), newa(:,:)
 real :: delta
 integer :: iters

 integer :: dt1(8), dt2(8), t1, t2
 real :: rt

 n = 400
 nargs = iargc()
 if( nargs == 0 )then
   print *, 'jacobi size1 [size2, defaults to size1]'
   return
 endif
 if( nargs >= 1 )then
  call getarg( 1, arg )
  read(arg,'(i)') n
 endif

 m = n
 if( nargs >= 2 )then
  call getarg( 2, arg )
  read(arg,'(i)') m
 endif

 allocate( a(m,n) )
 allocate( newa(m,n) )

 do j = 1,n
  do i = 1,m
   a(i,j) = 0
   newa(i,j) = 0
  enddo
 enddo

do i = 1, m
  a(i,n) = i
 enddo
 do j = 1, n
  a(m,j) = j
 enddo
 a(m,n) = m+n
 
 call date_and_time( values=dt1 )
 call jacobi( a, newa, n, m, .2, .1, .1, .1, delta, iters )
 call date_and_time( values=dt2 )
 t1 = dt1(8) + 1000*(dt1(7)+60*dt1(6)+60*(dt1(5)))
 t2 = dt2(8) + 1000*(dt2(7)+60*dt2(6)+60*(dt2(5)))
 write(*,10) delta, iters, n, m
10 format( 'reached delta=', f15.6, ' in ', i, ' iterations for ', i4, ' x ', i4, ' array' )
 rt = (t2 - t1)
 rt = rt / 1000.
 write(*,20) rt
20 format( 'time=', f15.6, ' seconds' )

end program
Back to top
View user's profile
toepfer



Joined: 04 Dec 2007
Posts: 48

PostPosted: Wed Jun 27, 2012 3:57 pm    Post subject: Reply with quote

The 12.5 release is still considered an early access/beta release with regards to OpenACC functionality. The upcoming 12.6 release, which will be full OpenACC 1.0 compliant compiles and runs your example program without errors.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group