PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Finding Minimum Values using Parallel Loops

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
chris.sl.lim



Joined: 11 Jan 2013
Posts: 15

PostPosted: Tue Apr 02, 2013 8:36 am    Post subject: Finding Minimum Values using Parallel Loops Reply with quote

Hi all,

I have a double-nested loop in Fortran that computes some results from values stored in an array and stores the minimum values.

I was wondering if there was any way of using GPU parallelisation to speed this up? PGPROF reports a compute intensity of 4.17 and it is a major chunk of run-time for my program.

I tried splitting the loop to store the results in temporary arrays (then looking through these to find the minimum value), shifting the IF statements to outside the main loop in order to remove the scalar dependency, but this resulted in privisation of these arrays prevent parallelisation.

Is there a better way to go about this, or is it a situation not geared towards parallelisation due to the need to store all the results.

Chris


Code:
      DO 200 KWALL  = KS,KE,1
      KM1 = KWALL-1
      IF(KM1.LT.1)    KM1 = 1
      KP1 = KWALL
      IF(KP1.GT.KMM1) KP1 = KMM1
      DO 200 JWALL  = JS,JE,1
      JM1 = JWALL-1
      IF(JM1.LT.1)    JM1 = 1
      JP1 = JWALL
      IF(JP1.GT.JMM1) JP1 = JMM1
!
!      FIRST THE I = 1 WALL
!
      FSOLID = 1.0 -0.25*(MWALLI1(JM1,KM1,NBLCK)+MWALLI1(JP1,KP1,NBLCK) &
                        + MWALLI1(JM1,KP1,NBLCK)+MWALLI1(JP1,KM1,NBLCK))
      FSOLID = FSOLID*I1_SHEAR(NBLCK)
      XD  = X(1,JWALL,KWALL,NBLCK)  - XP
      RD  = R(1,JWALL,KWALL,NBLCK)  - RP
      RTD = RT(1,JWALL,KWALL,NBLCK) - RTP
      DISTSQ = XD*XD + RD*RD + RTD*RTD
      DISTSQ = FSOLID*DISTSQ + (1.-FSOLID)*DLMINSQ
      IF(DISTSQ.LT.DMINSQ) THEN
      DMINSQ = DISTSQ
      IMIN  = 1
      JMIN  = JWALL
      KMIN  = KWALL
      XDMIN = XD
      RDMIN = RD
      RTDMIN= RTD
      IF_FOUND = 1
      ENDIF
!
!     NEXT THE I = IM WALL.
!
      FSOLID = 1.0 -0.25*(MWALLIM(JM1,KM1,NBLCK)+MWALLIM(JP1,KP1,NBLCK) &
                        + MWALLIM(JM1,KP1,NBLCK)+MWALLIM(JP1,KM1,NBLCK))
      FSOLID = FSOLID*IM_SHEAR(NBLCK)
      XD  = X(IM,JWALL,KWALL,NBLCK)  - XP
      RD  = R(IM,JWALL,KWALL,NBLCK)  - RP
      RTD = RT(IM,JWALL,KWALL,NBLCK) - RTP
      DISTSQ = XD*XD + RD*RD + RTD*RTD
      DISTSQ = FSOLID*DISTSQ + (1.-FSOLID)*DLMINSQ
      IF(DISTSQ.LT.DMINSQ) THEN
      DMINSQ = DISTSQ
      IMIN  = IM
      JMIN  = JWALL
      KMIN  = KWALL
      XDMIN = XD
      RDMIN = RD
      RTDMIN= RTD
      IF_FOUND = 1
      ENDIF
!
  200 CONTINUE
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6218
Location: The Portland Group Inc.

PostPosted: Tue Apr 02, 2013 9:43 am    Post subject: Reply with quote

Hi Chris,

Quote:
s there a better way to go about this, or is it a situation not geared towards parallelisation due to the need to store all the results.


You should be able parallelize this code. Whether you achieve speed-up may be in question, but at least with OpenACC it's not much work to find out.

The problem with this code is the min reduction. If you we're just looking for the min value of each these values (independent of the others) then you could just use an OpenACC reduction. However, since you want to keep track of the values when DISTSQ is at it's min, you'll need to keep track of all the values (i.e. manually privatizing these by turning them into temp arrays) and perform the min reduction after. Either on the host, which means more data movement, or in a sequential kernel on the device.

Quote:
but this resulted in privisation of these arrays prevent parallelisation.
I'm guessing this because you only privatized on the KWALL index but are parallelizing both the KWALL and JWALL loops. In this case, you'll need to either add a second dimension to the temp arrays for the JWALL index or just parallelize the outer KWALL loop,

Hope this helps,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group