PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

How to parallel outer loop only

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Tingwen



Joined: 16 Mar 2010
Posts: 3

PostPosted: Tue Mar 16, 2010 12:58 pm    Post subject: How to parallel outer loop only Reply with quote

Hi,
I try to add accelerator directives into a subroutine with nested loops. The limits of inner loops are variant in the outer loops. May I parallelize the outer loop only to overcome the restriction that inner loop limits must be constant? Attached please find the code. Any suggestion is appreciated.


Code:
      SUBROUTINE GRID_BASED_NEIGHBOR_SEARCH
     
      USE param1
      USE discretelement
      USE geometry
      USE des_bc

      IMPLICIT NONE
!-----------------------------------------------
! Local variables
!-----------------------------------------------
      INTEGER I, II, IP1, IM1   ! X-coordinate loop indices
      INTEGER J, JJ, JP1, JM1   ! Y-coordinate loop indices
      INTEGER K, KK, KP1, KM1   ! Z-coordinate loop indices
      INTEGER PNO ! Temp. particle number variable
      INTEGER NPG ! Temp. cell particle count
      INTEGER LL, NP, NEIGH_L  ! Loop Counters
      INTEGER NLIM !
     
      DOUBLE PRECISION DISTVEC(DIMN), DIST, R_LM ! Contact variables

!$acc region do kernel copy(neighbours) copyin(pijk,des_pos_new) &
!$acc       copyin(imin1,imax1,jmin1,jmax1,dimn,kmin1,kmax1)     & 
!$acc       copyin(des_radius,factor_rlm)
      DO LL = 1, MAX_PIS

         II = PIJK(LL,1); IP1=min(II+1,imax1); IM1=max(II-1,imin1)
         JJ = PIJK(LL,2); JP1=min(JJ+1,jmax1); JM1=max(JJ-1,jmin1)
         KK = PIJK(LL,3); KP1=KK;   KM1=KK
         IF(DIMN.EQ.3)THEN
            KP1 = min(KK+1,kmax1);   KM1 = max(KK-1,kmin1)
         ENDIF

         DO KK = KM1, KP1
            DO JJ = JM1, JP1
               DO II = IM1, IP1
! Shift loop index to new variables for manipulation
                  I = II;   J = JJ;   K = KK
! If cell IJK contains particles, store the amount in NPG
                  IF(ASSOCIATED(PIC(I,J,K)%P))THEN
                     NPG = SIZE(PIC(I,J,K)%P)
                  ELSE
                     NPG = 0
                  ENDIF

! Loop over the particles in IJK cell to determine if they are
! neighbors to particle LL
                  DO NP = 1,NPG
                     PNO = PIC(I,J,K)%P(NP)

                     IF(PNO.GT.LL)THEN
                        R_LM = DES_RADIUS(LL) + DES_RADIUS(PNO)
                        R_LM = FACTOR_RLM*R_LM
                        DISTVEC(:) = DES_POS_NEW(PNO,:) - DES_POS_NEW(LL,:)
                        if(dimn.eq.2)then
                           dist=sqrt(distvec(1)**2+distvec(2)**2)
                        else
                           dist=sqrt(distvec(1)**2+distvec(2)**2+distvec(3)**2)
                        endif

                        IF(DIST .LE. R_LM) then
                            NEIGHBOURS(LL,1) = NEIGHBOURS(LL,1) + 1
                            NLIM  = NEIGHBOURS(LL,1) + 1
                            NEIGHBOURS(LL,NLIM) = PNO
                        ENDIF  !contact condition
                     ENDIF  !PNO.GT.LL
                  ENDDO  !NP

               ENDDO  ! II cell loop
            ENDDO  ! JJ cell loop
         ENDDO  ! KK cell loop

      ENDDO  ! Particles in system loop
!$acc end region

      END SUBROUTINE GRID_BASED_NEIGHBOR_SEARCH
[/code]
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Tue Mar 16, 2010 2:41 pm    Post subject: Reply with quote

Hi Tingwen,

You should be able to work around the rectangular loop restriction using the "kernel" clause (like you have it now). However, you'll need to remove "ASSOCIATED" is it isn't supported on the GPU. Also, you'll need to privatize DISTVEC (i.e add "private(DISTVEC)" to your kernel clause).

Let me know if I missed anything by posting the output from your compile with "-Minfo=accel".

Hope this helps,
Mat
Back to top
View user's profile
Tingwen



Joined: 16 Mar 2010
Posts: 3

PostPosted: Tue Mar 16, 2010 3:22 pm    Post subject: Reply with quote

Hi Mat,
Thanks for your prompt reply. I made the changes and commented the "associated" function by setting NPG and PNO to constants. Below is the output when I compile it with PGI 10.3. Do you have any idea what is wrong? Thanks

Code:
     22, No parallel kernels found, accelerator region ignored
     25, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
     55, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
         Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
     56, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
         Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
     57, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
         Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
     70, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
         Complex loop carried dependence of 'neighbours' prevents parallelization
         Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
     76, Loop is parallelizable
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Tue Mar 16, 2010 4:02 pm    Post subject: Reply with quote

Hi Tingwen,

Did you add the private clause for DISTVEC?

Try replacing your "$acc region" lines with a simpler version:
Code:
!$acc region
!$acc do kernel private(DISTVEC)


This works for me, but I did have to modify your code to work around your modules. It's possible my changes effected the behavior. If this is the case, please send the full source to PGI Customer Support (trs@pgroup.com) and ask them to send it on to me.

- Mat
Back to top
View user's profile
Tingwen



Joined: 16 Mar 2010
Posts: 3

PostPosted: Wed Mar 17, 2010 5:47 am    Post subject: Reply with quote

Hi Mat,
Many thanks. I figured out a way to do it by replacing the vector with three scalars. Now it compiles successfully. Really appreciate your help.

Tingwen
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group