Request support/help for PBS with OpenMPI

Posts: 8258
Joined: Jun 30 2004

Re: Request support/help for PBS with OpenMPI

Post by mkcolg » Thu Aug 08, 2019 1:13 pm

Hi Ron,

This is a bit beyond either my or Chris area of expertise, but Chris is going to reach out to other folks within NVIDIA to see if they might have ideas.

Also, you may see if NASA can get in contact with the NVIDIA Solution Architect (SA) assigned to their account (not sure who it is though). SA's should be better able provide insights into hardware and network issues.


Posts: 14
Joined: Aug 13 2018

Re: Request support/help for PBS with OpenMPI

Post by cparrott » Thu Aug 08, 2019 2:09 pm

Hi Ron,

Just FYI - we are primarily a compiler development and support organization here, so some of your concerns regarding Open MPI performance may fall a bit outside of our area of experience. However, I have reached out to some other people within NVIDIA who can hopefully help us out with these concerns you have raised here.

One of our Open MPI engineers immediately responded back with two suggestions that I wanted to pass along:

1. He is concerned that your PLEIADES cluster may not be using InfiniBand for the Open MPI transport. Here is what he said:
Reading the thread, this comment is worrying :

« So it turns out the openmpi was in fact detecting the infiniband (I think) as the code does run on multiple nodes (just slow). »

No, this is actually a good sign Infiniband was not detected and he's running on TCP/IP.

This is how you check the openib BTL is there (in Open MPI 1.10.7) and force it to be used :

Code: Select all

$ ompi_info | grep openib
                 MCA btl: openib (MCA v2.0.0, API v2.0.0, Component v1.10.7)
$ mpirun -mca btl openib,smcuda,self ...
And this is for UCX (in Open MPI 4.0.0) :

Code: Select all

$ ompi_info | grep ucx
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.0.0)
                 MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.0)
$ mpirun -mca pml ucx ...
2. He also mentioned that the previous support of InfiniBand in Open MPI is deprecated as of Open MPI 4.x, and the Open MPI developers are recommending that everyone switch to using UCX instead, and then building Open MPI against UCX. UCX will manage the InfiniBand transport for Open MPI going forward, rather than having Open MPI manage it directly. Based on their recommendations, we are actually planning to update our bundled Open MPI to a 4.x release and change the configuration to use UCX with the 20.1 release, due in early 2020.

This link provides a good overview of how to build UCX and link Open MPI against it: ... n-with-UCX

Note that you do not need to clone Open MPI source from their github repo as described in the OpenMPI and OpenSHMEM Installation section. It should work fine with Open MPI 4.0.1, for example. You will need to be sure you are using an up-to-date version of UCX, though.

Hopefully this will give you some ideas to try, going forward.

Good luck,


Posts: 14
Joined: Aug 13 2018

Re: Request support/help for PBS with OpenMPI

Post by cparrott » Thu Aug 08, 2019 2:15 pm

Hi Ron,

Here is a step-by-step guide to configuring Open MPI with UCX that another NVIDIA engineer just passed along to me:
1. Setup UCX

a. Download and build gdrcopy (optional but recommended)
git clone # (see instructions in
cd gdrcopy/
sudo make PREFIX=/usr CUDA=/usr/local/cuda all install
sudo ./

# Copy library .so and header to /usr or wherever you decide GDRCOPY_HOME is
sudo cp /usr/lib64/
sudo cp /usr/lib64/
sudo cp /usr/lib64/
sudo cp gdrapi.h /usr/include

b. Download UCX
Either download the latest release here: or clone master branch
git clone

* Note on picking the closest HCA to a GPU used by a UCX process:
On multiple-HCA machines, using UCX 1.5.x series is recommended for its GPU-HCA affinity support.
The last known 1.5.x release is here:
wget ... 5.2.tar.gz
tar xfz ucx-1.5.2.tar.gz
cd ucx-1.5.2

c. Build UCX with cuda-support

./ # if configure isn't present
sudo apt-get install libnuma-dev # if libnuma-dev isn't installed
./configure --prefix=$UCX_HOME --with-cuda=$CUDA_HOME --with-gdrcopy=$GDRCOPY_HOME --enable-mt
sudo make -j install

# Export paths to access binaries and libraries later
export PATH=$UCX_HOME/bin:$PATH

2. Setup OpenMPI

a. Download OpenMPI
Either download the latest release here: (recommended)
wget ... 0.1.tar.gz
tar xfz openmpi-4.0.1.tar.gz
cd openmpi-4.0.1

or clone master branch
git clone

b. Build OpenMPI with cuda-support
./ # if configure isn't present
./configure --prefix=$OMPI_HOME --enable-mpirun-prefix-by-default --with-cuda=$CUDA_HOME --with-ucx=$UCX_HOME --with-ucx-libdir=$UCX_HOME/lib --enable-mca-no-build=btl-uct --with-pmix=internal
sudo make -j install

3. Run osu-micro-benchmarks (OMB)

a. Download OMB
Either from here (recommended) or clone
git clone

b. Build OMB
../configure --enable-cuda --with-cuda-include=$CUDA_HOME/include --with-cuda-libpath=$CUDA_HOME/lib64 CC=$MPI_HOME/bin/mpicc CXX=$MPI_HOME/bin/mpicxx --prefix=$PWD
make -j install

c. Run OMB
cat hostfile
hsw210 slots=1 max-slots=1
hsw211 slots=1 max-slots=1
mpirun -np 2 --hostfile $PWD/hostfile --mca pml ucx -x UCX_MEMTYPE_CACHE=n -x UCX_TLS=rc,mm,cuda_copy,gdr_copy,cuda_ipc -x LD_LIBRARY_PATH $PWD/get_local_ompi_rank $PWD/mpi/pt2pt/osu_bw D D
Hope you find this helpful.


Posts: 151
Joined: Nov 27 2012

Re: Request support/help for PBS with OpenMPI

Post by sumseq » Thu Sep 05, 2019 5:31 pm


Here is an update on my issues on Pleiades running multi-node GPU runs:

1) Using OpenMPI 4.0.1 or 4.0.2rc1 with PBS and UCX (latest stable or latest) and PGI 18.10 causes my code to crash. On certain types of runs it works, and on my other GPU code it works, but it crashes in a specific routine. This routine works fine in all my other test on multi-node GPU runs on other systems and the tests below so I do not think it is a code bug, but a UCX bug (a similar error message is in their online bug reports).

2) I was unable to compile OpenMPI 3.x using PGI 18.10 - I got compilation errors.

3) Using OpienMPI 2.1.2 with PGI 18.10 compiled with PBS and verbs WORKS! on multi-node! Yay!
The timing result on 2 4xV100 nodes is similar to that on a single 8xV100 node (exactly the same computation time, and a tad slower MPI time as expected).

4) Using OpenMPI 4.0.2rc1 with PGI 18.10 compiled with PBS and verbs also works (no crashes) but is VERY slow. It also spits out:

Code: Select all

[r101i0n2:16086] 15 more processes have sent help message help-mpi-btl-openib.txt / ib port not selected
[r101i0n2:16086] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[r101i0n2:16086] 15 more processes have sent help message help-mpi-btl-openib.txt / error in device init
I assume this has to do with OpenMPI not supporting verbs in versions 4.x.


Basically, for my code there seems to be a bug in UCX which means I cannot use OpenMPI 4.x, but I can get my runs done using OpenMPI 2.x with verbs. (I assume PGI 19.x can compile OpenMPI 3.x since it comes with that, so I also will assume 19.x will work with verbs on PBS).

You said that PGI will start packing OpenMPI 4.x with UCX in the next release, but please take this test into account as currently, that will crash my runs.

I can provide a reproducer that you can test with if you would like.

- Ron

Posts: 14
Joined: Aug 13 2018

Re: Request support/help for PBS with OpenMPI

Post by cparrott » Fri Sep 06, 2019 1:57 pm

Hi Ron,

Thanks for the feedback.

I have seen some of the same issues with Open MPI + UCX, too, and have raised them with our internal contacts for Open MPI and Mellanox. I will check and see if they are aware of this particular issue. It seems like things are in a bit of flux right now, as development with Open MPI and UCX is rapidly evolving, and things do not always seem to be in sync between the two. Much of this is beyond our direct control within the PGI group, so all we can do is make sure the appropriate people are aware of the issues, and hope they will be addressed soon.



Post Reply