PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

loop is parallelizable

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
WENYANG LIU



Joined: 26 Sep 2010
Posts: 11

PostPosted: Tue Oct 19, 2010 11:44 am    Post subject: loop is parallelizable Reply with quote

Hi everyone,

I tired pgfortran v10.3 and v10.9 to compile my code.
The compilation message always contains:
Code:

Loop is parallelizable

But the corresponding loops are not actually parallelized.
Is this a problem with my code or the compiler?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Tue Oct 19, 2010 4:03 pm    Post subject: Reply with quote

Quote:
Is this a problem with my code or the compiler?
Sorry, I'll need more information. What other informational messages are printed? Can you post an example?

- Mat
Back to top
View user's profile
WENYANG LIU



Joined: 26 Sep 2010
Posts: 11

PostPosted: Tue Oct 19, 2010 5:06 pm    Post subject: Reply with quote

Hi Mat,

Here is an example:
Code:

      program para

      implicit none

      integer::f(10),f_j(10),fsum(10)
      integer::i,j
      fsum=0
!$acc data region local(f),copy(fsum)
!$acc region
!$acc do private(f_j)
      do i=1,10
         f(i)=i
         do j=1,10
            f_j(j)=f(i)+10
         end do
         fsum(i)=sum(f_j)
      end do
!$acc end region
!$acc end data region

      write(*,*)fsum

      end program



Code:

para:
      8, Generating local(f(:))
         Generating copy(fsum(:))
      9, Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
     11, Loop is parallelizable
         Accelerator kernel generated
         11, !$acc do parallel, vector(10)
             CC 1.0 : 14 registers; 20 shared, 32 constant, 0 local memory bytes; 33 occupancy
             CC 1.3 : 14 registers; 20 shared, 32 constant, 0 local memory bytes; 25 occupancy
     13, Loop is parallelizable
     16, sum reduction inlined
         Loop is parallelizable


I used "do private" for i-loop based on the compilation message that it requires privatization of array 'f_j(1:10)'.

Thanks.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Wed Oct 20, 2010 10:02 am    Post subject: Reply with quote

Hi WENYANG LIU,

In this case, while the "j" loop and the sum reduction are parallelizable, the compiler had determined the optimal schedule is to only create a kernel for the "i" loop with the body of the loop containing the kernel. The only alternate schedule would be to break the loop into three kernels. In which case, you'll need to rewrite the code a bit:
Code:
$ cat test.f90

      program para

      implicit none

      integer::f(10),f_j(10,10),fsum(10)
      integer::i,j
      fsum=0
!$acc data region local(f),copy(fsum)
!$acc region
      do i=1,10
         f(i)=i
      enddo
      do i=1,10
         do j=1,10
            f_j(i,j)=f(i)+10
         end do
      enddo
      do i=1,10
         fsum(i)=sum(f_j(i,:))
      end do
!$acc end region
!$acc end data region

      write(*,*)fsum

      end program


$ pgf90 -ta=nvidia -Minfo=accel test.f90
para:
      9, Generating local(f(:))
         Generating copy(fsum(:))
     10, Generating copyout(f_j(1:10,1:10))
         Generating compute capability 1.0 binary
         Generating compute capability 1.3 binary
     11, Loop is parallelizable
         Accelerator kernel generated
         11, !$acc do parallel, vector(10)
             CC 1.0 : 4 registers; 20 shared, 52 constant, 0 local memory bytes; 33 occupancy
             CC 1.3 : 4 registers; 20 shared, 52 constant, 0 local memory bytes; 25 occupancy
     14, Loop is parallelizable
     15, Loop is parallelizable
         Accelerator kernel generated
         14, !$acc do parallel, vector(10)
             Cached references to size [10] block of 'f'
         15, !$acc do parallel, vector(10)
             CC 1.0 : 6 registers; 64 shared, 52 constant, 0 local memory bytes; 100 occupancy
             CC 1.3 : 6 registers; 64 shared, 52 constant, 0 local memory bytes; 100 occupancy
     19, Loop is parallelizable
         Accelerator kernel generated
         19, !$acc do parallel, vector(10)
             CC 1.0 : 8 registers; 20 shared, 48 constant, 0 local memory bytes; 33 occupancy
             CC 1.3 : 8 registers; 20 shared, 48 constant, 0 local memory bytes; 25 occupancy
     20, Loop is parallelizable
$ a.out
          110          120          130          140          150          160
          170          180          190          200


- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group