PGI MPI Questions

What should I know to get an MPI cluster working?

A cluster is a set of machines that are connected with some communication interface: all are running the same Operating System, and all are configured to allow communication from and to any of its members. There is a single master node, and a number of slave nodes.

The MPI libraries are located in the same path on each of the nodes. This can be accomplished by duplicating the paths on each node, or configuring a commonly mounted disk area that all the cluster nodes see as the same path.

To use the MPICH1 included with the PGI Workstation installation for Linux, first add your MPICH bin directory. You choose betweem MPI libraries built for SSH or RSH. To be successful, you need to be able to communicate between any node N and any other node M. For example, from NodeN execute the following command to obtain the hostname of NodeM:

rsh  NodeM  'uname -n'

Note that NodeN should be able to communicate with itself in the same manner.

A list of nodes needs to be provided at runtime, either from a default list, or list provided when the job is started. If no list is available, the master node runs all the processes.

The processes executed in an MPI application have their own memory space. This space is separate from the other N-1 processes. The MPI libraries enable you to pass information between processes in the form of messages conforming to the MPI requirements.

If a program runs successfully with one MPI library, another MPI library should also run successfully provided that it is compatible with your cluster hardware configuration.

How do I use the standard MPICH1 installation that comes with the Linux PGI compilers?

To use the MPICH1 installed in the PGI Workstation installation for Linux, you need to first add your MPICH bin directory to your $PATH. Assume the below commands are executed after the PGI compilers have been added to your $PATH, and $PGI has been defined.

export PATH=$PGI/linux86-64/2012/mpi/mpich/bin:$PATH

or in csh

set path=($PGI/linux86-64/2012/mpi/mpich/bin $path)

Now that that MPICH bin directory is in your path, you have access to mpif90, mpicc, and mpicxx, along with the execution tool, mpirun. Note that the list that determines which machines run the N processes is by default found in $PGI/linux86-64/2012/mpi/mpich/share/machines.LINUX. Often there is only the one machine in the list. You can edit this file and add the names of the other machines in your cluster, or you can create your own cluster list file and direct mpirun to use it.

mpirun -np N  -machinefile MY_Machine_List  a.out 
Click me