PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

-mp=numa

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
santex



Joined: 07 Oct 2004
Posts: 1

PostPosted: Wed Sep 07, 2005 1:25 am    Post subject: -mp=numa Reply with quote

Hi,

The documentation of the PGI compiler doesn't says realy
much about the new -mp=numa option.

Can someone please explain how a program can benefit from
this optimization?

best regards
Alex
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Wed Sep 07, 2005 12:14 pm    Post subject: Reply with quote

Hi Alex,

More information about "-mp=numa" as well as NUMA (Non-uniform memory access) can be found in the PGI release notes.

Basically, "-mp=numa" links your application with the NUMA libraries. (See section 3.2.2 of the PGI release notes for a complete list of O.S. which support NUMA.) Using NUMA can improve performance of some parallel applications by reducing memory latency. Linking "-mp=numa" also allows you to use the environment variables "MP_BIND", "MP_BLIST",and "MP_SPIN".

When "MP_BIND" is set to "yes", parallel processes or threads are bound to a physical processor. This helps ensure that the kernel won't move your process to a different CPU while it's running.

Using "MP_BLIST", you can specify exactly which processors to attach your process to. For example, if you have a Quad Dual-Core System (8 CPUS), you can set the blist so that the processes are interleaved across the 4 nodes ("MP_BLIST=2,4,6,0,1,3,5,7") or bound to a particular node ("MP_BLIST=6,7").

Threads at a barrier in a parallel region check a semaphore to determine if they can proceed. If the semaphore is not free after a certain number of tries, the thread gives up the processor (via sched_yield) for a while before checking again. The "MP_SPIN" variable defines the number of times a thread checks a semaphore before calling sched_yield. Setting MP_SPIN to -1 tells the thread to never call sched_yield. This can help performance but can waste CPU cycles that could be used by a different process if the thread spends a significant amount of time in a barrier.

Hope this helps,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group