PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

PGI 6.0 on AMD64: numa and numactl ?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
cmn



Joined: 13 Sep 2005
Posts: 2

PostPosted: Thu Oct 13, 2005 1:17 am    Post subject: PGI 6.0 on AMD64: numa and numactl ? Reply with quote

Hi,
I am using PGI 6.0-5 compilers and CDK on AMD64 platform,
under a Linux kernel 2.6.9 (Red Hat 4.0 derived).
As for my previous messages on this forum, I have
some performance issues when comparing results
under the older kernel 2.4.
Since the codes I am testing are MPI-based rather than
OpenMP or autopar, numa option from the compiler
is ineffective, if I understand the release notes correctly.
Therefore, I would like to test the numactl tools and
library in order to squeeze more performance
from the system (HW and SW stack).

Do you have any suggestion regarding the usage
of numactl together with PGI 6.0 on AMD64 architecture?
Are they largely independent, so that I can safely
experiment with numactl options, or maybe do they
interact in some subtle way ?

Thanks!

--
cmn
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6139
Location: The Portland Group Inc.

PostPosted: Thu Oct 13, 2005 10:47 am    Post subject: Reply with quote

Hi cmn,

The numactl tool is independent from the compilers, however, you can use the "-mp=numa" flag during linking (even for non-OpenMP programs) to have your program linked with the system's NUMA libraries. However either method, numactl or "-mp=numa", will have mostly the same effect on your program. The main difference is how you'll run your program.

With numactl, you simply set the environment variable 'NCPUS' to the number of threads to run and then run your program with 'numactl <options> myprog.exe'. "-c n", specifies the number of nodes to run on. In a muti-CPU system 1 node equals 1 CPU. In a Dual-code system, 1 node equals 2 CPUs. If you need finer grain control for dual-core systems, you'll also need to use 'taskset <hexmask>' in order to set a specific CPU. Numactl's "-m n" flag indicates which node to lock your program's memory. Typically, this is the same node(s) that you have specified with "-c". Instead of locking the memory to a particular node, you can instead specify "--interleave" to have the memory interleaved across all available nodes. This can help memory bound codes that need a lot of throughput but in general you should lock the memory to a node.

If your using "-mp=numa", you set the NCPUS as before, but instead use environment variables "MP_BIND" and "MP_BLIST" to tell the runtime that the program should be bound to a node and which nodes to bind the program to. The syntax is "MP_BIND=yes|no" to indicate if the program should be bound or not (the default is no), and "MP_BLIST=0,1,2,.." sets which CPUs to set the threads. There is no equivalent to "--interleave". More on this can be found in the 6.0 release notes.

My experience using numactl has largely been with auto-parallelization (-Mconcur) running on a Quad dual-core system (8 CPUs). So unfortunately, I don't have much advice when running with MPI other to say I don't think either method would help since both are for use on a single multi-CPU system rather than a cluster. If you went with a hybrid model (MPI/OpenMP) then there might be some use, but other than that, I doubt it.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group