GPU Autodetect: How This Affects You

The PGI 18.7 release has a new feature intended to improve your experience for common cases when building programs for GPU computing. This affects both OpenACC (-acc and -ta=tesla) and CUDA Fortran. With PGI 18.7, the compiler will detect the CUDA driver and GPU compute capabilities on your system, and use the CUDA toolkit corresponding to that driver and generate code for the GPU (or GPUs) installed on your system. This matches the behavior that the PGI compilers have had for years with respect to the CPU version, where the compiler detects the type of CPU on which you are building your program (Haswell, Broadwell, Skylake, Zen, …) and optimizes and generates code for that CPU.
Previously, the PGI compilers were delivered with (usually) two versions of the CUDA toolkit, the latest version and the previous version. For instance, PGI 18.1 included the CUDA 8.0 and 9.0 toolkits. By default, the PGI compilers used the older of those toolkits, because code compiled by the newest toolkit would not run on a system with an older driver. We opted to provide better support for systems that had not updated their drivers yet. The downside of this was that newer features and optimizations available in the newer toolkit were not available by default. For instance, compute capability 7.0 (Volta) GPUs were supported in the CUDA 9.0 toolkit, but that was not the default toolkit. You had to add the cuda9.0 or the cc70 suboption to -ta=tesla or -Mcuda in order to generate code for a Volta.
Additionally, the PGI compilers generated code for all the compute capabilities supported by PGI and the default or selected toolkit. Unfortunately, generating a GPU binary is relatively slow, and the list of compute capabilities is growing (cc30, cc35, cc50, cc60, and now cc70). The increase in compile time was becoming a significant overhead.
Autodetect
With PGI 18.7, the compiler will detect the CUDA driver version on your system and build with the matching toolkit version. PGI 18.7 is delivered with CUDA 9.1 and CUDA 9.2 toolkits. If you have a system with a CUDA 9.2 driver, the compiler will use the CUDA 9.2 toolkit. If you still have a CUDA 9.1 driver, the compiler will use the CUDA 9.1 toolkit. When you upgrade your driver to 9.2, the compiler will automatically switch to use the newer toolkit.
Also, with PGI 18.7, the compiler will detect the GPU or GPUs on your system, and compile only for those GPU compute capabilities. If you have a Pascal or Volta GPU, the compiler will generate Pascal or Volta code, without wasting your time generating code for Kepler or Maxwell GPUs.
In most cases, the GPU Autodetect feature will provide the most up-to-date toolkit features and minimize your compile time, with no intervention on your part. However, there are some situations where you may have to or want to override the default behavior, such as cross compiling for a system with a different GPU, or compiling on a system with no GPU, or when you want to use a different toolkit than those delivered with the PGI compilers.
Cross Compiling
In the most common case, you are building and running your application on the same system. However, in some cases, such as on some of the largest supercomputers, you build your application on a front end system, then submit the job to run on the supercomputer. The front end system may have a different GPU or no GPU at all; it may even have a different CPU version. In that case, you will need to specify the GPU compute capability and (most likely) the CUDA toolkit version on the build line, either -ta=tesla:cc60,cuda9.1 (or appropriate) for OpenACC, or -Mcuda=cc60,cuda9.1 for CUDA Fortran.
If you are building an application for use by customers or users on a variety of other systems, where you don't know which GPU the user will run on, then you will need to decide which types of GPU and the minimum CUDA version that you will support for your users. You can specify a list of compute capabilities (cc35,cc60,cc70), or use the shorthand ccall. With -ta=tesla:ccall or -Mcuda=ccall, the compiler will generate code for all compute capabilities supported by PGI and the CUDA toolkit version being used.
Building on Systems with No GPU
If you build on a system with no CUDA driver and don't specify the compute capabilities or the toolkit version on the compile line, the compiler will select the older of the CUDA versions delivered by PGI, and build for ccall, which matches the previous PGI behavior. As before, you can select the toolkit version and compute capabilities on the command line, or you can use a toolkit version using the CUDA_HOME feature explained below.
Systems with Older or Newer CUDA Drivers
The PGI compilers are delivered with two CUDA toolkit versions. The PGI 18.7 compilers come with CUDA 9.1 and CUDA 9.2 toolkits. If you are building on or for a system with an older driver, then you shouldn't use either of these toolkits. You should build with the toolkit version that is no newer than the CUDA driver version that will be used. If you are building on or for a system with a newer driver, then you can use either of these toolkit versions. However, you may want to try building with a newer toolkit version, if you want to take advantage of new features in the toolkit.
PGI continues to test and support some CUDA toolkit versions that are older than the ones delivered. If you have an older 2018 PGI install that comes with, say, the CUDA 8.0 or CUDA 9.0 toolkits in the same install directory tree, those toolkits can still be used with the PGI 18.7 compilers, using the cuda8.0 or cuda9.0 options to -ta=tesla: or -Mcuda=.
Alternatively, if you have a CUDA toolkit that you've downloaded from the NVIDIA website and want the PGI compiler to use your CUDA toolkit directory, you can do that as well. One way is to give the install path to the CUDA toolkit on the compile line, with an setting likepgfortran CUDA_HOME=/usr/local/cuda/9.0 -ta=tesla …
or whatever the appropriate path is. The compiler will look in that directory for the toolkit and determine the CUDA version. You can also set the environment variable
export CUDA_HOME=/usr/local/cuda/9.0
You may already be using CUDA_HOME for use with NVCC, in which case the PGI compilers will default to use that toolkit version as well. If you want to use a different CUDA toolkit for use with PGI, you can set the environment variable
export PGI_CUDA_HOME=/usr/local/cuda/9.0
instead.
Note that errors can arise when you are using a CUDA toolkit that you've downloaded from the NVIDIA website. We generally make minor modifications to the toolkit to work with the compilers. If you are using a newer toolkit, you may run into incompatibilities that we have not yet addressed. Nevertheless, in most cases this works pretty well.
Summary
The GPU Autodetect feature in the PGI 18.7 release should make your most common use cases easier and decrease your compile time when using OpenACC or CUDA Fortran. As always, your feedback and suggestions are welcome.