|
| View previous topic :: View next topic |
| Author |
Message |
JMa
Joined: 30 Nov 2012 Posts: 14
|
Posted: Thu Jan 10, 2013 5:17 pm Post subject: reduction within "!$acc kernels loop" ? |
|
|
Hi All,
Is reduction allowed within !$acc kernels loop ?
I tried to compile the following small code but got many errors.
I know it will work by replacing "kernels" with "parallel" in the $acc line, but as found in my previous post, "$acc kernels" perfoms much much faster than "$acc parallel" ...
http://www.pgroup.com/userforum/viewtopic.php?t=3643
So it will be nice if it can work with "$acc kernels loop".
CODE:
tmp=0.d0
call system_clock(count1, count_rate, count_max)
!$acc kernels loop reduction(+:tmp)
do i=1, n_size
do j=1, n_size
do k = 1, n_size
c(i,j) = c(i,j) + a(i,k)*b(k,j)
tmp=tmp+1.d0
enddo
enddo
enddo
print*, 'iteration#:',tmp
call system_clock(count2, count_rate, count_max)
write(*,*)'GPU costs',(count2-count1),'micronseconds'
Thanks,
JMa |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Fri Jan 11, 2013 9:56 am Post subject: |
|
|
| Quote: | | Is reduction allowed within !$acc kernels loop ? | Yes. Though, I've complained to our compiler engineers since we don't print a feedback message when a reduction clause is used. We do when the compiler automatically generates the reduction, but just not when it's made explicit. They'll get it fixed.
Here's the output after I remove the "reduction" clause and use just "!$acc kernels loop". The that the sum reduction is generated for tmp.
| Code: | % pgf90 test.f90 -acc -ta=nvidia,4.2,keepgpu -Minfo=accel
testsub:
7, Generating present_or_copy(c(:,:))
Generating present_or_copyin(a(:,:))
Generating present_or_copyin(b(:,:))
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
9, Loop is parallelizable
10, Loop is parallelizable
11, Complex loop carried dependence of 'c' prevents parallelization
Loop carried dependence of 'c' prevents parallelization
Loop carried backward dependence of 'c' prevents vectorization
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
9, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
10, !$acc loop gang ! blockidx%y
11, CC 1.3 : 16 registers; 64 shared, 32 constant, 0 local memory bytes
CC 2.0 : 23 registers; 0 shared, 76 constant, 0 local memory bytes
13, Sum reduction generated for tmp
|
- Mat |
|
| Back to top |
|
 |
JMa
Joined: 30 Nov 2012 Posts: 14
|
Posted: Fri Jan 11, 2013 10:15 am Post subject: |
|
|
Hi Mat,
Nice to see you online and very happy to see your reply.
I tried to recompile it by removing "reduction(+:tmp), however, the parallelization generation failed by reporting:
30, Generating present_or_copy(c(:,:))
Generating present_or_copyin(a(:,:))
Generating present_or_copyin(b(:,:))
31, Loop carried scalar dependence for 'tmp' at line 35
Scalar last value needed after loop for 'tmp' at line 40
Accelerator restriction: scalar variable live-out from loop: tmp
Accelerator scalar kernel generated
32, Loop carried scalar dependence for 'tmp' at line 35
Scalar last value needed after loop for 'tmp' at line 40
Accelerator restriction: scalar variable live-out from loop: tmp
33, Complex loop carried dependence of 'c' prevents parallelization
Loop carried dependence due to exposed use of 'c(i1+1,i2+1)' prevents parallelization
Loop carried scalar dependence for 'tmp' at line 35
Scalar last value needed after loop for 'tmp' at line 40
[color=red]Accelerator restriction: scalar variable live-out from loop: tmp[/color]
In addition, when I have reduction explicitly stated with kernels, I got many errors:
------ Rebuild All started: Project: 2ndOpenACC, Configuration: Debug x64 ------
Deleting intermediate and output files for project '2ndOpenACC', configuration 'Debug'
Compiling Project ...
..\2ndOpenACCProgram.cuf
C:\Users\...\2ndOpenACCProgram.cuf(50) : warning W0093 : Type conversion of expression performed
C:\Users\...\2ndOpenACCProgram.cuf(50) : warning W0093 : Type conversion of expression performed
0 inform, 2 warnings, 0 severes, 0 fatal for example1
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(100): error: expected an identifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(100): error: expected a ")"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(98): error: attribute "__global__" does not apply here
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(98): error: attribute "launch_bounds" does not apply here
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(105): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(128): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(129): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(130): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(130): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(131): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(131): error: identifier "S108" is undefined
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(132): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(132): error: a value of type "float *" cannot be used to initialize an entity of type "int"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(133): error: expected an identifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(133): error: a value of type "float *" cannot be used to initialize an entity of type "int *"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(133): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(134): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(135): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(136): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(137): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(138): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(138): error: variable "rc4" has already been defined
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(139): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(140): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(140): error: variable "rc4" has already been defined
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(141): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(141): error: variable "rc4" has already been defined
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(142): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(142): error: variable "rc4" has already been defined
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(143): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(143): error: variable "rc4" has already been defined
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(144): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(144): error: variable "rc4" has already been defined
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(145): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(145): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(146): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(147): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(147): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(148): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(149): error: explicit type is missing ("int" assumed)
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(149): error: cannot overload functions distinguished by return type alone
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(150): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(150): error: variable "rc4" has already been defined
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(151): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(152): error: expected an identifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(152): error: a value of type "int" cannot be used to initialize an entity of type "int *"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(152): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(153): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(153): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(154): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(154): error: variable "rc5" has already been defined
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(155): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(156): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(156): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(157): error: explicit type is missing ("int" assumed)
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(157): error: cannot overload functions distinguished by return type alone
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(158): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(159): error: expected an identifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(159): error: a value of type "int" cannot be used to initialize an entity of type "int *"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(159): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(160): error: explicit type is missing ("int" assumed)
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(160): error: cannot overload functions distinguished by return type alone
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(161): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(162): error: expected an identifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(162): error: a value of type "int" cannot be used to initialize an entity of type "int *"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(162): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(163): error: explicit type is missing ("int" assumed)
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(163): error: cannot overload functions distinguished by return type alone
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(164): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(165): error: expected an identifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(165): error: a value of type "int" cannot be used to initialize an entity of type "int *"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(165): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(166): error: explicit type is missing ("int" assumed)
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(166): error: cannot overload functions distinguished by return type alone
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(167): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(168): error: expected an identifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(168): error: a value of type "int" cannot be used to initialize an entity of type "int *"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(168): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(169): error: explicit type is missing ("int" assumed)
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(169): error: cannot overload functions distinguished by return type alone
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(170): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(171): error: expected an identifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(171): error: a value of type "int" cannot be used to initialize an entity of type "int *"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(171): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(172): error: explicit type is missing ("int" assumed)
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(172): error: cannot overload functions distinguished by return type alone
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(173): error: expected a declaration
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(174): error: expected an identifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(174): error: a value of type "int" cannot be used to initialize an entity of type "int *"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(174): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(175): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(175): error: variable "b1" has already been defined
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(175): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(176): error: this declaration has no storage class or type specifier
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(176): error: expected a ";"
C:\Users\...\pgcudafor2a4C6bORi-hOB3.gpu(177): error: expected a declaration
96 errors detected in the compilation of "C:\Users\...\pgnvd2a2quIZC_nhXn.nv0".
2ndOpenACC build failed.
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========
It seems I got different compiling results as yours... Do you know why this happen?
Thanks and have a nice day,
Jingsen |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Fri Jan 11, 2013 10:23 am Post subject: |
|
|
Hi Jingsen,
Can you post your full example to ensure we're compiling the same thing? Also, which compiler version are you using? The second error with the explicit use of the reduction clause looks like a bug where we're generating bad CUDA code. Once I have your example, I'll investigate it.
- Mat |
|
| Back to top |
|
 |
JMa
Joined: 30 Nov 2012 Posts: 14
|
Posted: Fri Jan 11, 2013 10:37 am Post subject: |
|
|
Hi Mat,
The version was downloaded on Dec. 20, 2012, with Intel VF shell 2010: "pgivfx64-vs2010-1210.exe".
Following is the sample program.
Thanks,
Jingsen
! matrix-acc.f
program example1
parameter ( n_size=2000 )
real*8, dimension(:,:) :: a(n_size,n_size)
real*8, dimension(:,:) :: b(n_size,n_size)
real*8, dimension(:,:) :: c(n_size,n_size)
real*8, dimension(:,:) :: d(n_size,n_size)
character(10) :: time
real tmp
integer count1, count2, count_rate, count_max
! Initialize matrices (values differ from C version)
do i=1, n_size
do j=1, n_size
a(i,j) = i + j;
b(i,j) = i - j;
enddo
enddo
c=0.d0
d=0.d0
tmp=0.d0
call system_clock(count1, count_rate, count_max)
!$acc kernels loop !reduction(+:tmp)
do i=1, n_size
do j=1, n_size
do k = 1, n_size
c(i,j) = c(i,j) + a(i,k)*b(k,j)
tmp=tmp+1.d0
enddo
enddo
enddo
print*, 'iternation#:',tmp
call system_clock(count2, count_rate, count_max)
write(*,*)'GPU costs',(count2-count1),'micronseconds'
tmp=0.d0
call system_clock(count1, count_rate, count_max)
do i=1, n_size
do j=1, n_size
do k = 1, n_size
d(i,j) = d(i,j) + a(i,k)*b(k,j)
tmp=tmp+1.d0
enddo
enddo
enddo
call system_clock(count2, count_rate, count_max)
write(*,*)'CPU costs',(count2-count1),'micronseconds'
! check the results
do i=1, n_size
do j=1, n_size
if( c(i,j) .ne. d(i,j) )then
print *, i,j, c(i,j), d(i,j)
stop 'error found'
endif
enddo
enddo
print *, n_size*n_size, 'iterations completed'
end program example1 |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|