|
| View previous topic :: View next topic |
| Author |
Message |
gbj
Joined: 05 Feb 2010 Posts: 2
|
Posted: Tue Feb 09, 2010 6:43 pm Post subject: No parallel kernels found, accelerator region ignored |
|
|
I modified the f2.f program (which and compiles and runs
as expected) at the following site:
http://www.pgroup.com/lit/articles/insider/v1n1a1.htm.
to:
program main
use accel_lib
integer :: n,n1 ! size of the vector
real,dimension(:),allocatable :: a ! the vector
real,dimension(:),allocatable :: b ! the vector
real,dimension(:),allocatable :: r ! the results
real,dimension(:),allocatable :: e ! expected results
integer :: i
integer :: c0, c1, c2, c3, cgpu, chost
character(10) :: arg1
if( iargc() .gt. 0 )then
call getarg( 1, arg1 )
read(arg1,'(i10)') n
else
n = 100000
endif
n1 = 1
if( n .le. 0 ) n = 100000
allocate(a(n))
allocate(b(n))
allocate(r(n))
allocate(e(n))
do i = 1,n
a(i) = i*2.0
b(i) = i*2.0
enddo
call system_clock( count=c1 )
!call acc_init( acc_device_nvidia )
!$acc region
do i = n1,n
r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
enddo
!$acc end region
call multiply1()
call system_clock( count=c2 )
cgpu = c2 - c1
do i = 1,n
e(i) = sin(a(i)) ** 2 + cos(a(i)) ** 2
enddo
call system_clock( count=c3 )
chost = c3 - c2
! check the results
do i = 1,n
if( abs(r(i) - e(i)) .gt. 0.000001 )then
print *, i, r(i), e(i)
endif
enddo
print *, n, ' iterations completed'
print *, cgpu, ' microseconds on GPU'
print *, chost, ' microseconds on host'
contains
subroutine multiply1()
!call acc_init( acc_device_nvidia )
!$acc region
do i = n1,n
r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
enddo
!$acc end region
end subroutine
end program
When I compile this I get the following error:main:
29, No parallel kernels found, accelerator region ignored
31, Accelerator restriction: induction variable live-out from loop: i
32, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: .dY0002
multiply1:
57, No parallel kernels found, accelerator region ignored
59, Accelerator restriction: induction variable live-out from loop: i
60, Accelerator restriction: induction variable live-out from loop: i
Accelerator restriction: induction variable live-out from loop: .dY0005
Any one knows what is going on? |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Feb 10, 2010 10:38 am Post subject: |
|
|
Hi gbj,
This looks like a compiler error being caused by the use of the contained subroutine. I've sent a report to our engineers (TPR#16595) and hopefully we can have this fixed soon.
The workaround is to move the contained subroutine to an external subroutine.
| Code: | % cat f1.f
program main
use accel_lib
implicit none
integer :: n,n1 ! size of the vector
real,dimension(:),allocatable :: a ! the vector
real,dimension(:),allocatable :: b ! the vector
real,dimension(:),allocatable :: r ! the results
real,dimension(:),allocatable :: e ! expected results
integer :: i,ii,iargc
integer :: c0, c1, c2, c3, cgpu, chost
character(10) :: arg1
if( iargc() .gt. 0 )then
call getarg( 1, arg1 )
read(arg1,'(i10)') n
else
n = 100000
endif
n1 = 1
if( n .le. 0 ) n = 100000
allocate(a(n))
allocate(b(n))
allocate(r(n))
allocate(e(n))
do i = 1,n
a(i) = i*2.0
b(i) = i*2.0
enddo
call acc_init( acc_device_nvidia )
call system_clock( count=c1 )
!$acc region
do i = n1,n
r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
enddo
!$acc end region
call multiply1(r,a,b,n1,n)
call system_clock( count=c2 )
cgpu = c2 - c1
do i = 1,n
e(i) = sin(a(i)) ** 2 + cos(a(i)) ** 2
enddo
call system_clock( count=c3 )
chost = c3 - c2
! check the results
do i = 1,n
if( abs(r(i) - e(i)) .gt. 0.000001 )then
print *, i, r(i), e(i)
endif
enddo
print *, n, ' iterations completed'
print *, cgpu, ' microseconds on GPU'
print *, chost, ' microseconds on host'
end program
subroutine multiply1(r,a,b,n1,n)
implicit none
real,dimension(*) :: a ! the vector
real,dimension(*) :: b ! the vector
real,dimension(*) :: r ! the results
integer :: n, n1, i
! call acc_init( acc_device_nvidia )
!$acc region
do i = n1,n
r(i) = sin(a(i)) ** 2 + cos(b(i)) ** 2
enddo
!$acc end region
end subroutine
% pgf90 -ta=nvidia,time -Minfo=accel f1.f -V10.2 -fastsse -o f1.out
main:
31, Generating copyin(b(1:n))
Generating copyin(a(1:n))
Generating copyout(r(1:n))
32, Loop is parallelizable
Accelerator kernel generated
32, !$acc do parallel, vector(256)
multiply1:
66, Generating copyin(b(n1:n))
Generating copyin(a(n1:n))
Generating copyout(r(n1:n))
67, Loop is parallelizable
Accelerator kernel generated
67, !$acc do parallel, vector(256)
%
% f1.out
100000 iterations completed
2699 microseconds on GPU
1432 microseconds on host
Accelerator Kernel Timing data
/tmp/f1.f
multiply1
66: region entered 1 time
time(us): total=1211
kernels=155 data=1056
67: kernel launched 1 times
grid: [391] block: [256]
time(us): total=155 max=155 min=155 avg=155
/tmp/f1.f
main
31: region entered 1 time
time(us): total=1482
kernels=164 data=1318
32: kernel launched 1 times
grid: [391] block: [256]
time(us): total=164 max=164 min=164 avg=164
acc_init.c
acc_init
41: region entered 1 time
time(us): init=4293831
|
Thanks,
Mat |
|
| Back to top |
|
 |
gbj
Joined: 05 Feb 2010 Posts: 2
|
Posted: Wed Feb 10, 2010 1:28 pm Post subject: |
|
|
| Thank you Mat. Can you please notify me when this update has been applied? |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Feb 11, 2010 2:21 pm Post subject: |
|
|
Hi Gustaaf,
I've added you to the notification list for TPR#16595. I'll also update this post once a fix is available.
- Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|