|
| View previous topic :: View next topic |
| Author |
Message |
Tingwen
Joined: 16 Mar 2010 Posts: 3
|
Posted: Tue Mar 16, 2010 12:58 pm Post subject: How to parallel outer loop only |
|
|
Hi,
I try to add accelerator directives into a subroutine with nested loops. The limits of inner loops are variant in the outer loops. May I parallelize the outer loop only to overcome the restriction that inner loop limits must be constant? Attached please find the code. Any suggestion is appreciated.
| Code: | SUBROUTINE GRID_BASED_NEIGHBOR_SEARCH
USE param1
USE discretelement
USE geometry
USE des_bc
IMPLICIT NONE
!-----------------------------------------------
! Local variables
!-----------------------------------------------
INTEGER I, II, IP1, IM1 ! X-coordinate loop indices
INTEGER J, JJ, JP1, JM1 ! Y-coordinate loop indices
INTEGER K, KK, KP1, KM1 ! Z-coordinate loop indices
INTEGER PNO ! Temp. particle number variable
INTEGER NPG ! Temp. cell particle count
INTEGER LL, NP, NEIGH_L ! Loop Counters
INTEGER NLIM !
DOUBLE PRECISION DISTVEC(DIMN), DIST, R_LM ! Contact variables
!$acc region do kernel copy(neighbours) copyin(pijk,des_pos_new) &
!$acc copyin(imin1,imax1,jmin1,jmax1,dimn,kmin1,kmax1) &
!$acc copyin(des_radius,factor_rlm)
DO LL = 1, MAX_PIS
II = PIJK(LL,1); IP1=min(II+1,imax1); IM1=max(II-1,imin1)
JJ = PIJK(LL,2); JP1=min(JJ+1,jmax1); JM1=max(JJ-1,jmin1)
KK = PIJK(LL,3); KP1=KK; KM1=KK
IF(DIMN.EQ.3)THEN
KP1 = min(KK+1,kmax1); KM1 = max(KK-1,kmin1)
ENDIF
DO KK = KM1, KP1
DO JJ = JM1, JP1
DO II = IM1, IP1
! Shift loop index to new variables for manipulation
I = II; J = JJ; K = KK
! If cell IJK contains particles, store the amount in NPG
IF(ASSOCIATED(PIC(I,J,K)%P))THEN
NPG = SIZE(PIC(I,J,K)%P)
ELSE
NPG = 0
ENDIF
! Loop over the particles in IJK cell to determine if they are
! neighbors to particle LL
DO NP = 1,NPG
PNO = PIC(I,J,K)%P(NP)
IF(PNO.GT.LL)THEN
R_LM = DES_RADIUS(LL) + DES_RADIUS(PNO)
R_LM = FACTOR_RLM*R_LM
DISTVEC(:) = DES_POS_NEW(PNO,:) - DES_POS_NEW(LL,:)
if(dimn.eq.2)then
dist=sqrt(distvec(1)**2+distvec(2)**2)
else
dist=sqrt(distvec(1)**2+distvec(2)**2+distvec(3)**2)
endif
IF(DIST .LE. R_LM) then
NEIGHBOURS(LL,1) = NEIGHBOURS(LL,1) + 1
NLIM = NEIGHBOURS(LL,1) + 1
NEIGHBOURS(LL,NLIM) = PNO
ENDIF !contact condition
ENDIF !PNO.GT.LL
ENDDO !NP
ENDDO ! II cell loop
ENDDO ! JJ cell loop
ENDDO ! KK cell loop
ENDDO ! Particles in system loop
!$acc end region
END SUBROUTINE GRID_BASED_NEIGHBOR_SEARCH | [/code] |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Tue Mar 16, 2010 2:41 pm Post subject: |
|
|
Hi Tingwen,
You should be able to work around the rectangular loop restriction using the "kernel" clause (like you have it now). However, you'll need to remove "ASSOCIATED" is it isn't supported on the GPU. Also, you'll need to privatize DISTVEC (i.e add "private(DISTVEC)" to your kernel clause).
Let me know if I missed anything by posting the output from your compile with "-Minfo=accel".
Hope this helps,
Mat |
|
| Back to top |
|
 |
Tingwen
Joined: 16 Mar 2010 Posts: 3
|
Posted: Tue Mar 16, 2010 3:22 pm Post subject: |
|
|
Hi Mat,
Thanks for your prompt reply. I made the changes and commented the "associated" function by setting NPG and PNO to constants. Below is the output when I compile it with PGI 10.3. Do you have any idea what is wrong? Thanks
| Code: | 22, No parallel kernels found, accelerator region ignored
25, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
55, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
56, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
57, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
70, Loop carried dependence due to exposed use of 'distvec(1:3)' prevents parallelization
Complex loop carried dependence of 'neighbours' prevents parallelization
Loop carried dependence due to exposed use of 'neighbours(i1+1,1)' prevents parallelization
76, Loop is parallelizable
|
|
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Tue Mar 16, 2010 4:02 pm Post subject: |
|
|
Hi Tingwen,
Did you add the private clause for DISTVEC?
Try replacing your "$acc region" lines with a simpler version:
| Code: | !$acc region
!$acc do kernel private(DISTVEC)
|
This works for me, but I did have to modify your code to work around your modules. It's possible my changes effected the behavior. If this is the case, please send the full source to PGI Customer Support (trs@pgroup.com) and ask them to send it on to me.
- Mat |
|
| Back to top |
|
 |
Tingwen
Joined: 16 Mar 2010 Posts: 3
|
Posted: Wed Mar 17, 2010 5:47 am Post subject: |
|
|
Hi Mat,
Many thanks. I figured out a way to do it by replacing the vector with three scalars. Now it compiles successfully. Really appreciate your help.
Tingwen |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|