Correct environment variables for MPI + OpenMP + OpenACC hybrid code

OpenACC and CUDA Fortran
Post Reply
Peter85
Posts: 48
Joined: Mar 06 2018

Correct environment variables for MPI + OpenMP + OpenACC hybrid code

Post by Peter85 » Wed May 20, 2020 12:28 am

Hi,

I have a hybrid code with MPI + OpenMP + OpenACC using ta=multicore.
When I run the code I encounter extreme low performance. The code does not contain nested regions of OpenMP and OpenACC.
When I replace the OpenACC sections by OpenMP directives and remove ta=multicore I get the desired performance.
I think it is related to thread affinity but I’m unable figure it out. Do I need to set additional flags/environment variables?

Thank you for your help

I compile the code with the following flags:

Code: Select all

FFLAGS  = -fast -mcmodel=medium -mp -tp=skylake -m64 -cpp -Mmkl -acc -Minfo=acc  -ta=multicore
I run this on an Intel Skylake CPU, I set the following environment variables

Code: Select all

export ACC_NUM_CORES=20
export OMP_NUM_THREADS=40
mpirun  -n 1 -x UCX_MEMTYPE_CACHE=n ./prog

mkcolg
Posts: 8382
Joined: Jun 30 2004

Re: Correct environment variables for MPI + OpenMP + OpenACC hybrid code

Post by mkcolg » Wed May 20, 2020 4:21 pm

Hi Peter,

I'll need to do a bit of research on this one. I've not tried mixing OpenMP and OpenACC targeting multicore together myself and not entirely sure of the interaction of the two runtimes.

Also I see you are using MKL which could also be using OpenMP. Are you calling MKL from any OpenACC regions?

If at all possible, a reproducing example would be very welcome to ensure that I'm am to replicate and then investigate the issue here.

Thanks,
Mat

Peter85
Posts: 48
Joined: Mar 06 2018

Re: Correct environment variables for MPI + OpenMP + OpenACC hybrid code

Post by Peter85 » Wed May 20, 2020 6:53 pm

Hi Mat,

thank you for your answer. I experimented a little bit and found that if I pass --bind-to socket to mpirun it works correctly.

No the MKL is not called from OpenACC.

I have another question regarding ACC_NUM_CORES and OMP_NUM_THREADS. What is the relation between these two variables?
When users run a hybrid code they have to set ACC_NUM_CORES to the number of cores and OMP_NUM_THREADS to the number of threads? When ACC_NUM_CORES is not set, what kind of default value is used?

Thank you for your help.

mkcolg
Posts: 8382
Joined: Jun 30 2004

Re: Correct environment variables for MPI + OpenMP + OpenACC hybrid code

Post by mkcolg » Thu May 21, 2020 9:23 am

When users run a hybrid code they have to set ACC_NUM_CORES to the number of cores and OMP_NUM_THREADS to the number of threads?
Correct. They are independent.
When ACC_NUM_CORES is not set, what kind of default value is used?
The runtime will default to using all the physical cores on the system. While OpenMP defaults to the total number of cores (including hyper threads).

Here's a simple example to illustrate. I'm running on a 2 socket Skylake, 20 physical cores per socket with 2 hyper threads per core.

Code: Select all

% cat acc_mp.c
#include <stdio.h>
#include <omp.h>
#define N 1000
int main() {
  int v[N];
#pragma acc parallel loop
  for(int i = 0; i < N; ++i) {
    v[i] = i;
    if(i == 0) {
      printf("ACC: #threads: %d\n", omp_get_num_threads());
    }
  }
#pragma omp parallel
  {
#pragma omp single
    {
      printf("OMP #threads: %d\n", omp_get_num_threads());
    }
  }
}
% pgcc -mp -acc -ta=multicore acc_mp.c
% echo $OMP_NUM_THREADS
OMP_NUM_THREADS: Undefined variable.
% echo $ACC_NUM_CORES
ACC_NUM_CORES: Undefined variable.
% a.out
ACC: #threads: 40
OMP #threads: 80
% setenv ACC_NUM_CORES 10
% setenv OMP_NUM_THREADS 20
% a.out
ACC: #threads: 10
OMP #threads: 20
Hope this helps,
Mat

Peter85
Posts: 48
Joined: Mar 06 2018

Re: Correct environment variables for MPI + OpenMP + OpenACC hybrid code

Post by Peter85 » Mon May 25, 2020 1:15 am

mkcolg wrote:
Thu May 21, 2020 9:23 am
When users run a hybrid code they have to set ACC_NUM_CORES to the number of cores and OMP_NUM_THREADS to the number of threads?
Correct. They are independent.
When ACC_NUM_CORES is not set, what kind of default value is used?
The runtime will default to using all the physical cores on the system. While OpenMP defaults to the total number of cores (including hyper threads).

Here's a simple example to illustrate. I'm running on a 2 socket Skylake, 20 physical cores per socket with 2 hyper threads per core.

Code: Select all

% cat acc_mp.c
#include <stdio.h>
#include <omp.h>
#define N 1000
int main() {
  int v[N];
#pragma acc parallel loop
  for(int i = 0; i < N; ++i) {
    v[i] = i;
    if(i == 0) {
      printf("ACC: #threads: %d\n", omp_get_num_threads());
    }
  }
#pragma omp parallel
  {
#pragma omp single
    {
      printf("OMP #threads: %d\n", omp_get_num_threads());
    }
  }
}
% pgcc -mp -acc -ta=multicore acc_mp.c
% echo $OMP_NUM_THREADS
OMP_NUM_THREADS: Undefined variable.
% echo $ACC_NUM_CORES
ACC_NUM_CORES: Undefined variable.
% a.out
ACC: #threads: 40
OMP #threads: 80
% setenv ACC_NUM_CORES 10
% setenv OMP_NUM_THREADS 20
% a.out
ACC: #threads: 10
OMP #threads: 20
Hope this helps,
Mat
Thanks for your answer. I will check it.

Post Reply