Joined: 30 Jun 2004
Location: The Portland Group Inc.
|Posted: Wed Sep 07, 2005 12:14 pm Post subject:
More information about "-mp=numa" as well as NUMA (Non-uniform memory access) can be found in the PGI release notes.
Basically, "-mp=numa" links your application with the NUMA libraries. (See section 3.2.2 of the PGI release notes for a complete list of O.S. which support NUMA.) Using NUMA can improve performance of some parallel applications by reducing memory latency. Linking "-mp=numa" also allows you to use the environment variables "MP_BIND", "MP_BLIST",and "MP_SPIN".
When "MP_BIND" is set to "yes", parallel processes or threads are bound to a physical processor. This helps ensure that the kernel won't move your process to a different CPU while it's running.
Using "MP_BLIST", you can specify exactly which processors to attach your process to. For example, if you have a Quad Dual-Core System (8 CPUS), you can set the blist so that the processes are interleaved across the 4 nodes ("MP_BLIST=2,4,6,0,1,3,5,7") or bound to a particular node ("MP_BLIST=6,7").
Threads at a barrier in a parallel region check a semaphore to determine if they can proceed. If the semaphore is not free after a certain number of tries, the thread gives up the processor (via sched_yield) for a while before checking again. The "MP_SPIN" variable defines the number of times a thread checks a semaphore before calling sched_yield. Setting MP_SPIN to -1 tells the thread to never call sched_yield. This can help performance but can waste CPU cycles that could be used by a different process if the thread spends a significant amount of time in a barrier.
Hope this helps,