PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

No parallel kernels found, accelerator region ignored

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
gbj



Joined: 05 Feb 2010
Posts: 2

PostPosted: Tue Feb 09, 2010 6:43 pm    Post subject: No parallel kernels found, accelerator region ignored Reply with quote

I modified the f2.f program (which and compiles and runs
as expected) at the following site:
http://www.pgroup.com/lit/articles/insider/v1n1a1.htm.
to:

program main
use accel_lib
integer :: n,n1 ! size of the vector
real,dimension(:),allocatable :: a ! the vector
real,dimension(:),allocatable :: b ! the vector
real,dimension(:),allocatable :: r ! the results
real,dimension(:),allocatable :: e ! expected results
integer :: i
integer :: c0, c1, c2, c3, cgpu, chost
character(10) :: arg1
if( iargc() .gt. 0 )then
call getarg( 1, arg1 )
read(arg1,'(i10)') n
else
n = 100000
endif
n1 = 1
if( n .le. 0 ) n = 100000
allocate(a(n))
allocate(b(n))
allocate(r(n))
allocate(e(n))
do i = 1,n
a(i) = i*2.0
b(i) = i*2.0
enddo
call system_clock( count=c1 )
!call acc_init( acc_device_nvidia )
!$acc region
do i = n1,n
r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
enddo
!$acc end region
call multiply1()
call system_clock( count=c2 )
cgpu = c2 - c1
do i = 1,n
e(i) = sin(a(i)) ** 2 + cos(a(i)) ** 2
enddo
call system_clock( count=c3 )
chost = c3 - c2
! check the results
do i = 1,n
if( abs(r(i) - e(i)) .gt. 0.000001 )then
print *, i, r(i), e(i)
endif
enddo
print *, n, ' iterations completed'
print *, cgpu, ' microseconds on GPU'
print *, chost, ' microseconds on host'

contains

subroutine multiply1()

!call acc_init( acc_device_nvidia )
!$acc region
do i = n1,n
r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
enddo
!$acc end region
end subroutine

end program


When I compile this I get the following error:main:
29, No parallel kernels found, accelerator region ignored
31, Accelerator restriction: induction variable live-out from loop: i
32, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: .dY0002
multiply1:
57, No parallel kernels found, accelerator region ignored
59, Accelerator restriction: induction variable live-out from loop: i
60, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: .dY0005

Any one knows what is going on?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Wed Feb 10, 2010 10:38 am    Post subject: Reply with quote

Hi gbj,

This looks like a compiler error being caused by the use of the contained subroutine. I've sent a report to our engineers (TPR#16595) and hopefully we can have this fixed soon.

The workaround is to move the contained subroutine to an external subroutine.

Code:
% cat f1.f
        program main
        use accel_lib
        implicit none
        integer :: n,n1         ! size of the vector
        real,dimension(:),allocatable :: a ! the vector
        real,dimension(:),allocatable :: b ! the vector
        real,dimension(:),allocatable :: r ! the results
        real,dimension(:),allocatable :: e ! expected results
        integer :: i,ii,iargc
        integer :: c0, c1, c2, c3, cgpu, chost
        character(10) :: arg1
        if( iargc() .gt. 0 )then
           call getarg( 1, arg1 )
           read(arg1,'(i10)') n
        else
           n = 100000
        endif
        n1 = 1
        if( n .le. 0 ) n = 100000
        allocate(a(n))
        allocate(b(n))
        allocate(r(n))
        allocate(e(n))
        do i = 1,n
           a(i) = i*2.0
           b(i) = i*2.0
        enddo
        call acc_init( acc_device_nvidia )
        call system_clock( count=c1 )

!$acc region
        do i = n1,n
           r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
        enddo
!$acc end region

        call multiply1(r,a,b,n1,n)
        call system_clock( count=c2 )
        cgpu = c2 - c1
        do i = 1,n
        e(i) = sin(a(i)) ** 2 + cos(a(i)) ** 2
        enddo
        call system_clock( count=c3 )
        chost = c3 - c2
!       check the results
        do i = 1,n
           if( abs(r(i) - e(i)) .gt. 0.000001 )then
              print *, i, r(i), e(i)
           endif
        enddo
        print *, n, ' iterations completed'
        print *, cgpu, ' microseconds on GPU'
        print *, chost, ' microseconds on host'

        end program


        subroutine multiply1(r,a,b,n1,n)
        implicit none
        real,dimension(*) :: a ! the vector
        real,dimension(*) :: b ! the vector
        real,dimension(*) :: r ! the results
        integer :: n, n1, i

!       call acc_init( acc_device_nvidia )
!$acc region
        do i = n1,n
            r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
        enddo
!$acc end region
        end subroutine

% pgf90 -ta=nvidia,time -Minfo=accel f1.f -V10.2 -fastsse -o f1.out
main:
     31, Generating copyin(b(1:n))
         Generating copyin(a(1:n))
         Generating copyout(r(1:n))
     32, Loop is parallelizable
         Accelerator kernel generated
         32, !$acc do parallel, vector(256)
multiply1:
     66, Generating copyin(b(n1:n))
         Generating copyin(a(n1:n))
         Generating copyout(r(n1:n))
     67, Loop is parallelizable
         Accelerator kernel generated
         67, !$acc do parallel, vector(256)
%
% f1.out
       100000  iterations completed
         2699  microseconds on GPU
         1432  microseconds on host

Accelerator Kernel Timing data
/tmp/f1.f
  multiply1
    66: region entered 1 time
        time(us): total=1211
                  kernels=155 data=1056
        67: kernel launched 1 times
            grid: [391]  block: [256]
            time(us): total=155 max=155 min=155 avg=155
/tmp/f1.f
  main
    31: region entered 1 time
        time(us): total=1482
                  kernels=164 data=1318
        32: kernel launched 1 times
            grid: [391]  block: [256]
            time(us): total=164 max=164 min=164 avg=164
acc_init.c
  acc_init
    41: region entered 1 time
        time(us): init=4293831


Thanks,
Mat
Back to top
View user's profile
gbj



Joined: 05 Feb 2010
Posts: 2

PostPosted: Wed Feb 10, 2010 1:28 pm    Post subject: Reply with quote

Thank you Mat. Can you please notify me when this update has been applied?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Thu Feb 11, 2010 2:21 pm    Post subject: Reply with quote

Hi Gustaaf,

I've added you to the notification list for TPR#16595. I'll also update this post once a fix is available.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group