PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Different results on CPU and GPU
Goto page 1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
anirbanjana



Joined: 11 Aug 2012
Posts: 28

PostPosted: Tue Nov 19, 2013 4:56 pm    Post subject: Different results on CPU and GPU Reply with quote

Hi,
For the code below, I get different results on the CPU and GPU. The differences appear in the array PFT. Repeated running will change the differences, and occasionally may produce no difference.

Code:

!$acc data copy(NWALLS, MAX_PIP, WALLDTSPLIT, WALLCONTACT(1:MAX_PIP,1:NWALLS), PEA(1:MAX_PIP,1:4), &
!$acc&          DES_POS_NEW(1:MAX_PIP,1:DIMN), W_POS_L(1:MAX_PIP,1:NWALLS,1:DIMN), PFT(1:MAX_PIP,1:MAXNEIGHBORS,1:DIMN) )
!$acc parallel
!$acc loop gang, private(LL, NI, IW, PFT_TMP(1:DIMN), DIST(1:DIMN) )

      DO LL = 1, MAX_PIP
         IF(.NOT.PEA(LL,1) .OR. PEA(LL,4) ) CYCLE

         DO IW = 1, NWALLS

               IF(.NOT.WALLDTSPLIT .OR. PEA(LL,2) .OR. PEA(LL,3) .OR. WALLCONTACT(LL,IW).NE.1 ) GOTO 200

               NI=IW !Line added by AJ for debugging
               DIST(:)=ZERO  !Line added by AJ for debugging
               DIST(:) = w_pos_l(LL,IW,:) - DES_POS_NEW(LL,:)

! Save the tangential displacement history with the correction of Coulomb's law
                  PFT_TMP(:)=DIST(:)
                  IF (PARTICLE_SLIDE) THEN
                  ELSE
                     PFT(LL,NI,:) = PFT_TMP(:)
                  ENDIF

                  PARTICLE_SLIDE = .FALSE.

 200           CONTINUE
            ENDDO ! DO IW = 1, NWALLS
      ENDDO !Loop over particles LL to calculate wall contact
!$acc end parallel
!$acc end data
 


When I comment out the line
Code:

DIST(:) = w_pos_l(LL,IW,:) - DES_POS_NEW(LL,:)

I get identical PFT array from both CPU and GPU runs.

I checked that the arrays DES_POS_NEW and W_POS_L remain same even when PFT differs.

Note that the code pasted above is a stripped down version of a part of the file model/des/calc_force_des.f from the MFIX code.

Best
Anirban
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Tue Nov 19, 2013 5:57 pm    Post subject: Reply with quote

Hi Anirban,

I don't see anything obvious. Though since you indicate that the issue might be due to DIST, I'd try manually privatizing it as well as PFT_TMP (i.e. add a second dimension DIST(1:MAX_PIP,1:DIMN)). It will also improve performance since the data will now be accessed contiguously across threads.

FYI, NWALLS and MAX_PIP don't need to be copied is scalars are implicitly firstprivate.

- Mat
Back to top
View user's profile
anirbanjana



Joined: 11 Aug 2012
Posts: 28

PostPosted: Wed Nov 20, 2013 4:25 am    Post subject: Reply with quote

Hi Mat,
Thanks very much for the prompt feedback. Per you advice, I made the arrays DIST and PFT_TMP manually private.
Code:

 DOUBLE PRECISION, dimension(:,:),allocatable:: DIST, PFT_TMP
 ALLOCATE(DIST(MAX_PIP, DIMN) )
 ALLOCATE(PFT_TMP(MAX_PIP,DIMN) )
....
....
!$acc data copy(NWALLS, MAX_PIP, WALLDTSPLIT, WALLCONTACT(1:MAX_PIP,1:NWALLS), PEA(1:MAX_PIP,1:4), &
!$acc&          DES_POS_NEW(1:MAX_PIP,1:DIMN), W_POS_L(1:MAX_PIP,1:NWALLS,1:DIMN), PFT(1:MAX_PIP,1:MAXNEIGHBORS,1:DIMN), &
!$acc&          DIST(1:MAX_PIP,1:DIMN), PFT_TMP(1:MAX_PIP,1:DIMN) )
!$acc parallel
!$acc loop gang, private(LL, NI, IW )

      DO LL = 1, MAX_PIP
         IF(.NOT.PEA(LL,1) .OR. PEA(LL,4) ) CYCLE

         DO IW = 1, NWALLS

               IF(.NOT.WALLDTSPLIT .OR. PEA(LL,2) .OR. PEA(LL,3) .OR. WALLCONTACT(LL,IW).NE.1 ) GOTO 200

               NI=IW !Line added by AJ for debugging
               DIST(LL,:)=ZERO  !Line added by AJ for debugging
               DIST(LL,:) = w_pos_l(LL,IW,:) - DES_POS_NEW(LL,:)

! Save the tangential displacement history with the correction of Coulomb's law
                  PFT_TMP(LL,:)=DIST(LL,:)
                  IF (PARTICLE_SLIDE) THEN
                  ELSE
                     PFT(LL,NI,:) = PFT_TMP(LL,:)
                  ENDIF

                  PARTICLE_SLIDE = .FALSE.

 200           CONTINUE
            ENDDO ! DO IW = 1, NWALLS
      ENDDO !Loop over particles LL to calculate wall contact
!$acc end parallel
!$acc end data


There is still a small difference between the CPU and GPU results, but now the difference stayed the same when I repeated the exercise 4 times.
Code:

363566c363566
<    0.000E+00   0.000E+00   0.000E+00
---
>    0.000E+00   0.120E-02   0.000E+00


On MAX_PIP, and NWALLS, I originally did not explicitly copy them, but I am doing so now just to be extra cautious. Does it hurt to do so? In fact, I would like them to be shared ( single copy on GPU) rather than firstprivate.

I can upload my branch of MFIX for your perusal. It will be a great help if it can be tested at PGI.

Best
Anirban
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Wed Nov 20, 2013 10:57 am    Post subject: Reply with quote

Quote:
Does it hurt to do so? In fact, I would like them to be shared ( single copy on GPU) rather than firstprivate.
Putting scalars in a copy clause makes them global. firstprivate will create a local scalar in the kernel and increase the likelihood that it will put into a register. It may not matter much given there's only one reference, but there will be one less global reference.

Quote:
I can upload my branch of MFIX for your perusal. It will be a great help if it can be tested at PGI.
That should be fine. Once you have the port complete, it would be good to put the code into our QA testing. But before that, I can manually test the code. I'll need to know the name of the CVS server since the one listed in the docs is either wrong or only accessible within ORNL. Though, let's take any connection questions I may have offline.

- Mat
Back to top
View user's profile
anirbanjana



Joined: 11 Aug 2012
Posts: 28

PostPosted: Wed Nov 20, 2013 11:47 am    Post subject: Reply with quote

Thanks much Mat.

Knowing the default treatment of scalars will definitely come in handy during the performance tuning phase. Now I recollect reading it in some other post on this forum.

Its best to ftp the branch of MFIX I am working with. You had sent me the following FTP site in an earlier post:
https://www.pgroup.com/support/ftp_access.php
I was thinking of uploading here. Is that OK?

Best
Anirban
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group