PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

OpenMP and magically evil performance results

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
AndrewWilson41729



Joined: 19 Dec 2013
Posts: 14

PostPosted: Thu Jun 26, 2014 2:50 pm    Post subject: OpenMP and magically evil performance results Reply with quote

I have a fairly large code which uses OpenMP heavily.

My machine is a dual-processor: Xeon e5-2620. 2 packages; 6 physical cores per package; hyperthreading 2 threads per core. Total of 24 virtual processors.

I intermittently experience a condition where the multi-processing...fails? Properties of this condition include:
- The program returns the correct result...eventually.
- When using 1 thread, the program maxes out a processor usage
- When using 2-6 threads, the program has a speedup of 1.5-3-ish, with CPU utilization somewhat less than the threads/total-virtual-cores ratio.
- At some number of threads (sometimes 13, sometimes a different number) the CPU utilization goes to less than 4%, or less than a single maxed-out core.
- The condition persists through reboots, delete-all-compiled-code-and-rebuild-s, and other attempts to fix it
- The condition magically goes away for weeks at a time

Sometimes I get a reasonable speedup vs number-of-cores curve (with a jump down at 7 cores, and otherwise a convex curve landing on ~10x), and sometimes my speedup curve suddenly becomes a slowdown curve.

Both good and bad results can come from the same code. I have tried to disable all "turn my processor down" power management settings.

Does anyone have any idea what's going on here?
Back to top
View user's profile
AndrewWilson41729



Joined: 19 Dec 2013
Posts: 14

PostPosted: Thu Jun 26, 2014 3:01 pm    Post subject: Reply with quote

A follow-up on some noted PGI/Intel differences in OpenMP.

When things are working well with my code, and I set the number of threads to 12, the program executes on one physical processor. That is to say, all 12 processes are launched on the 12 virtual cores corresponding to the 6 physical cores of one of the two packages.

This is exactly wrong from a performance point of view for my particular case (where it would be more optimal to use 12 threads across as many physical cores as possible). Intel OpenMP has some flags for setting this:
thread affinity interface

Is there any equivalent interface that I have missed for PGI OpenMP?[/url]
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6663
Location: The Portland Group Inc.

PostPosted: Fri Jun 27, 2014 10:42 am    Post subject: Reply with quote

Hi Andrew,

Quote:
Is there any equivalent interface that I have missed for PGI OpenMP?
Yes, "MP_BIND=yes" to enable thread/core binding. This will bind thread 0 to core 0, thread 1 to core 1, etc. Optionally, you can set MP_BLIST=0,2,4,.." to set the specific binding order.

There's also the OpenMP standard "OMP_PROC_BIND=true" environment variable.

If you're on Linux, I find the "numactl" utility useful as well. It sets the binding but is agnostic to the compiler so the same settings can be used without concern for how the binary was built. (See "man numactl" for details)

Note you might be better off halving the number of threads and running only on the physical cores. Hyper threading is usefully for fast context switching but only one thread can use a core at a time. Hence if your threads are engaged in heavy computation, there will be contention for the core. I may not be better, but worth an experiment.

Another environment variable to try is "MP_SPIN". This sets the number of times the OpenMP runtime checks a semaphore before putting the thread to "sleep" (sched_yield) when the thread is blocked. Setting "MP_SIPN=-1" say to never sleep and saves the cost of saving and restoring the thread. However, the thread will poll and thus take up computational resources which may be a problem if you over scribe your physical cores or have other applications running.

Hope this helps,
Mat
Back to top
View user's profile
AndrewWilson41729



Joined: 19 Dec 2013
Posts: 14

PostPosted: Mon Jun 30, 2014 2:06 pm    Post subject: Reply with quote

Thanks, Mat. Helps some.

I get distinctly conflicting impressions about hyperthreading depending on what/where I read. But at any rate, MP_BLIST and MP_BIND are enough to tweak the affinity and assign processors to physical cores (which is mostly the best way to do it, from my experiments).

Any thoughts on the magical OpenMP-not-working situation? I don't have it right now, but I don't know what made it go away and what will make it come back.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6663
Location: The Portland Group Inc.

PostPosted: Tue Jul 01, 2014 1:10 pm    Post subject: Reply with quote

Quote:
Any thoughts on the magical OpenMP-not-working situation?
My best guess is that the OS was scheduling all the thread on the same core. Let's see if it still persists after you start binding.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group