PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

PGI FORTRAN OpenMP: poor performance in a big loop???
Goto page Previous  1, 2, 3, 4  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Nick Kong



Joined: 08 Jun 2012
Posts: 11

PostPosted: Mon Jun 11, 2012 7:34 am    Post subject: Reply with quote

Dear Mat,

Please try the following FORTRAN codes -- the parallel computing is much slower that the sequential computing (just remove those "!$OMP" macros for the sequential computing):

SUBROUTINE OpenMP_EX()
IMPLICIT NONE

INTEGER :: I, J, K
DOUBLE PRECISION :: A

CALL omp_set_num_threads(10)

DO J= 1, 10000
!$OMP PARALLEL
!$OMP DO PRIVATE(I)
DO I = 1,10
!$OMP TASK
DO K = 1, 10**3
A = DEXP(1D0)
END DO
!$OMP END TASK
END DO
!$OMP END DO
!$OMP END PARALLEL
!WRITE(*,*) "J = ", J
END DO

END SUBROUTINE OpenMP_EX
Back to top
View user's profile
Nick Kong



Joined: 08 Jun 2012
Posts: 11

PostPosted: Mon Jun 11, 2012 7:57 am    Post subject: Reply with quote

Hi, Mat,

The following FORTRAN codes is more typical showing that "parallel computing" is even much slower that the "sequential computing":


SUBROUTINE OpenMP_EX()
IMPLICIT NONE

INTEGER :: I, J, K
DOUBLE PRECISION :: A

CALL omp_set_num_threads(10)

DO J= 1, 1000
!$OMP PARALLEL
!$OMP DO PRIVATE(I)
DO I = 1,10
!$OMP TASK
DO K = 1, 10**5
A = DEXP(1D0)
END DO
!$OMP END TASK
END DO
!$OMP END DO
!$OMP END PARALLEL
WRITE(*,*) "J = ", J
END DO

END SUBROUTINE OpenMP_EX
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Mon Jun 11, 2012 11:39 am    Post subject: Reply with quote

Thanks Nick that helped. The problem here is that by default we don't destroy and then recreate threads, but rather put the threads into an active wait mode (OMP_WAIT_POLICY=ACTIVE) where they actively spin on a barrier waiting to be reused. Almost all of your program's time is being spent waiting on this barrier.

The time spent in cycles waiting checking the barrier is set via the environment flag "MP_SPIN". So the fix is to set MP_SPIN to a small value (like 0 for no wait). The caveat being that your CPU utilization would be pegged at 100% for all threads even when your program was not running in parallel.

Hope this helps,
Mat
Back to top
View user's profile
Nick Kong



Joined: 08 Jun 2012
Posts: 11

PostPosted: Mon Jun 11, 2012 12:24 pm    Post subject: Reply with quote

Dear Mat,

Thank you for your help!

I tried "C>set MP_SPIN=0" to set MP_SPIN to zero and rerun the example codes -- but the slow parallel computing remains. Did I do the right way to set MP_SPIN to zero? Nick
Back to top
View user's profile
Nick Kong



Joined: 08 Jun 2012
Posts: 11

PostPosted: Mon Jun 11, 2012 12:48 pm    Post subject: Reply with quote

Hi, Mat,

I also tried "c:>set OMP_WAIT_POLICY=PASSIVE" (Windoes 7) and rerun the exanple codes -- but slow parallel computing remains! Is this correct way to set OMP_WAIT_POLICY to PASSIVE? or I need to restart my PC after this setting? Nick
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3, 4  Next
Page 2 of 4

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group