PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Am I creating too many OpenMP Tasks?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Performance and Benchmarking
View previous topic :: View next topic  
Author Message
_sayan_



Joined: 07 Apr 2012
Posts: 29

PostPosted: Tue Jul 17, 2012 4:25 pm    Post subject: Am I creating too many OpenMP Tasks? Reply with quote

Hello,

I want to compare OpenMP tasking implementation with OpenMP loop, please review my code structure below.

Code:

!$OMP PARALLEL
...
!$OMP SINGLE
CALL abc()
CALL def()
CALL ghi()
!$OMP END SINGLE
!$OMP END PARALLEL


Inside the subroutines:

Code:

SUBROUTINE abc()
...
DO i=i0,i1
   DO j=j0,j1
      DO k=k0,k1
          !$OMP TASK UNTIED
          <embarrassingly parallel computation>
          !$OMP END TASK
      ENDDO
     ENDDO
 ENDDO


This code takes a long time - the array bounds are quite large, am I just creating too many tasks for the compiler to handle?

Thank you,
Sayan
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Thu Jul 19, 2012 10:29 am    Post subject: Reply with quote

Hi Sayan,

I've discussed your issue with several other of our application engineers and aren't really sure what the problem is. It's possible that if your parallel computation is very small but there are an extremely large number of threads, your time is being dominated by the overhead of creating and managing tasks.

One thing to try is to record your times at 1, 2, 4, and the Max number of threads. If the program doesn't scale, i.e. the times are roughly the same, then this may indeed be the problem. The fix then would be to give each TASK more work.

If this isn't the problem, we'ed need to see a reproducing example to tell what's wrong.

Hope this helps,
Mat
Back to top
View user's profile
_sayan_



Joined: 07 Apr 2012
Posts: 29

PostPosted: Wed Jul 25, 2012 7:10 am    Post subject: Reply with quote

Hello Mat,

Thanks for answering. The code I am testing has sufficient computation. I think this is a data scoping issue, I am expecting that variables which are shared in OMP PARALLEL would propagate to nested TASK construct as well. Updated code structure:

Code:

!$OMP PARALLEL SHARED(...) PRIVATE(...) FIRSTPRIVATE(...)
!$OMP SINGLE
!$OMP TASK UNTIED
CALL test(...)
!$OMP END TASK
!$OMP END SINGLE
!$OMP END PARALLEL


In the test function:

Code:

SUBROUTINE test(...)
...
DO i=i0,in
DO j=j0,jn
DO k=k0,kn
!$OMP TASK SHARED(...) FIRSTPRIVATE(...) !variables are either shared or firstprivate here
<embarrassingly parallel code>
!$OMP END TASK
ENDDO
ENDDO
ENDDO
!$OMP TASKWAIT
END SUBROUTINE test


The code runs indefinitely; in this type of code structure, do you have an advice to avoid scoping related pitfalls? Thank you.
Back to top
View user's profile
_sayan_



Joined: 07 Apr 2012
Posts: 29

PostPosted: Thu Aug 02, 2012 4:20 pm    Post subject: Reply with quote

I was creating too many tasks, when I moved the task construct just after the outermost loop, my code ran.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Performance and Benchmarking All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group