|
| View previous topic :: View next topic |
| Author |
Message |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Jan 24, 2013 2:24 pm Post subject: |
|
|
Hi Ping,
Could you post an example of the Minfo output as well as a reproducing example? This will help me answer your first two questions.
For the third, yes there is some overhead in performing the present look-up, but is fairly small.
- Mat |
|
| Back to top |
|
 |
appleluo
Joined: 21 Nov 2012 Posts: 19
|
Posted: Fri Jan 25, 2013 9:24 am Post subject: |
|
|
Hi Mat,
Here is an example.
========= Begin program ==========
module mod1
real*8, allocatable :: a(:,:), b(:,:), c(:,:)
end module mod1
program prog1
use mod1
allocate(a(100,100),b(100,100),c(100,100))
c=0.0d0
a=1.13240d0
b=2.33413d0
call sub1
end program prog1
subroutine sub1
use mod1
integer i,j,k
!$acc data copyin(a,b) copy(c)
!$acc kernels loop present(a, b, c)
do j=1,100
do i=1,100
do k=1,100
c(i,j) = c(i,j)+a(i,k)*b(k,j)
enddo
enddo
enddo
!$acc end kernels
!$acc end data
end subroutine sub1
=========End of program===============
======== Begin compiler output ===========
pgfortran -acc -Minfo main.f90
prog1:
9, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
10, Memory set idiom, array assignment replaced by call to pgf90_mset8
11, Memory set idiom, array assignment replaced by call to pgf90_mset8
sub1:
20, Generating copyin(b(:,:))
Generating copyin(a(:,:))
Generating copy(c(:,:))
22, Generating present_or_copy(c(:,:))
Generating present_or_copyin(b(:,:))
Generating present_or_copyin(a(:,:))
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
23, Loop is parallelizable
24, Loop is parallelizable
25, Complex loop carried dependence of 'c' prevents parallelization
Loop carried dependence of 'c' prevents parallelization
Loop carried backward dependence of 'c' prevents vectorization
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
23, !$acc loop gang ! blockidx%y
24, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
25, CC 1.3 : 17 registers; 136 shared, 4 constant, 0 local memory bytes
CC 2.0 : 33 registers; 0 shared, 152 constant, 0 local memory bytes
==========End compiler output==================
If I delete present(a, b, c) from the parallel construct, the output from the compiler is as follow
sub1:
20, Generating copyin(b(:,:))
Generating copyin(a(:,:))
Generating copy(c(:,:))
22, Generating copy(c(:,:))
Generating copyin(a(:,:))
Generating copyin(b(:,:))
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
Thanks,
Ping |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Jan 28, 2013 11:42 am Post subject: |
|
|
Hi Ping,
You must be using an older version of the compiler. The Minfo messages originally hadn't been updated to reflect the "present_or_copy.." change that occurred in the 12.6 release. This was corrected in the 12.9 release.
Here's the output from 12.8 and 12.9: | Code: |
% pgf90 -acc -Minfo test2.f90 -V12.8
prog1:
9, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
10, Memory set idiom, array assignment replaced by call to pgf90_mset8
11, Memory set idiom, array assignment replaced by call to pgf90_mset8
sub1:
20, Generating copyin(b(:,:))
Generating copyin(a(:,:))
Generating copy(c(:,:))
22, Generating copy(c(:,:))
Generating copyin(a(:,:))
Generating copyin(b(:,:))
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
24, Loop is parallelizable
25, Loop is parallelizable
26, Complex loop carried dependence of 'c' prevents parallelization
Loop carried dependence of 'c' prevents parallelization
Loop carried backward dependence of 'c' prevents vectorization
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
24, !$acc loop gang ! blockidx%y
25, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
26, CC 1.3 : 17 registers; 128 shared, 4 constant, 0 local memory bytes
CC 2.0 : 33 registers; 0 shared, 144 constant, 0 local memory bytes
p% pgf90 -acc -Minfo test2.f90 -V12.9
prog1:
9, Memory zero idiom, array assignment replaced by call to pgf90_mzero8
10, Memory set idiom, array assignment replaced by call to pgf90_mset8
11, Memory set idiom, array assignment replaced by call to pgf90_mset8
sub1:
20, Generating copyin(b(:,:))
Generating copyin(a(:,:))
Generating copy(c(:,:))
22, Generating present_or_copy(c(:,:))
Generating present_or_copyin(a(:,:))
Generating present_or_copyin(b(:,:))
Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
24, Loop is parallelizable
25, Loop is parallelizable
26, Complex loop carried dependence of 'c' prevents parallelization
Loop carried dependence of 'c' prevents parallelization
Loop carried backward dependence of 'c' prevents vectorization
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
24, !$acc loop gang ! blockidx%y
25, !$acc loop gang, vector(128) ! blockidx%x threadidx%x
26, CC 1.3 : 17 registers; 112 shared, 4 constant, 0 local memory bytes
CC 2.0 : 42 registers; 0 shared, 128 constant, 0 local memory bytes
|
Sorry for the confusion,
Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|