Technical News from The Portland Group

Using NVIDIA GPU Accelerators with
PGI Visual Fortran

The PGI 2010 release of PGI Visual Fortran (PVF) includes many new features to simplify GPU acceleration of multicore x64 Windows Fortran applications. This article shows how to use the PGI Accelerator programming model and CUDA Fortran from within PVF, and provides a few hints for getting the most out of your GPU-enabled Windows systems. The two methods outline in this article are not mutually exclusive. A single Visual Studio Solution can contain both PGI Accelerator and CUDA Fortran files.

This article does not intend to teach you how to use PGI Accelerator directives or how to program in CUDA Fortran. If you need a tutorial introduction to either of these models, check out Michael Wolfe's series of articles on PGI Accelerator programming, or the more recent articles introducing CUDA Fortran programming and CUDA Fortran data management.

If you don't currently have PGI Visual Fortran, you can download it today and try out some of the features described below using a free 15 day trial license. If you've never used Microsoft Visual Studio before, you should get started by walking through the basic introduction in the "Getting Started with PVF" chapter in the PVF User's Guide.

PVF 2010 version 10.6 due in early June will contain two new sample solutions for GPU programming. Both samples are based on matrix multiply. The first uses the PGI Accelerator programming model to target GPUs and the second uses CUDA Fortran. Simply open the provided solution files, build and run. In addition to the included GPU sample files, version 10.6 will be the first release to PVF to include support for Microsoft Visual Studio 2010 as well.

The PGI Accelerator Programming Model

The PGI Accelerator programming model is a directive-based method for guiding the compiler to automatically translate loop nests into code (called kernels) for GPUs. The directives are enabled via a compiler option; without the option, or when compiling the code with another compiler, the directives are treated as comments and ignored. This feature of a directives-based model allows source code to remain compiler neutral and portable. However, as is usually the case when migrating applications to new hardware platforms, optimizing code for maximum performance on GPU targets usually requires structural or algorithmic changes beyond the addition of PGI Accelerator directives. These changes may be benign with respect to the performance impact on non-GPU targets (specifically a multi-core x64 host) , or may improve performance generally, or in some cases will decrease performance when compiled and run on the host. You need to be aware of these possibilities, and structure your code accordingly to maximize portability and performance portability. While they don't necessarily make GPU programming easy per se, PGI Accelerator directives are often the fastest and easiest way to test whether your code might benefit from GPU acceleration.

Target Accelerators Property Page

The PGI Accelerator programming model directives are enabled in a PVF project using the Target Accelerators property page. By default the Target Accelerators property page looks like this:

Target Accelerator VS Property Page

PGI compilers currently support only NVIDIA GPU accelerators, but the interface is designed so that other accelerator targets—for example ATI Stream accelerators or even high core count multicore x64—can be easily added in future releases. To enable the PGI Accelerator programming model directives for NVIDIA, set the Target NVIDIA Accelerator property to Yes:

Accelerator Properties

At this point, additional properties should appear on the Target Accelerators property page. If not, click Apply to update the properties.

To enable code generation for NVIDIA GPUs, PVF adds the compiler flag ‑ta=nvidia to compilation and linking. Each of the sub-options to the ‑ta compiler flag has a corresponding property on the Target Accelerators property page. Most of these properties are straightforward and self-describing, and all of the properties are described in detail in the PVF User's Guide. We'll go into depth with a couple of important properties here.

NVIDIA CUDA Toolkit Property

Together, a CUDA Toolkit and device driver define the CUDA software and runtime environment on an NVIDIA GPU-enabled system. The PGI Accelerator Fortran compiler embedded within PVF 2010 versions 10.4 and higher can generate code compatible with two different versions of NVIDIA's CUDA Toolkit: 2.3 and 3.0. These in turn are compatible with corresponding versions of NVIDIA's CUDA-enabled device drivers. The pgaccelinfo tool, available in the PVF Command Shell, prints the version of the CUDA driver installed on your system as the first line of its output:

      For a 2.3 driver: CUDA Driver Version 2030
      For a 3.0 driver: CUDA Driver Version 3000

Note that compiling with the CUDA 3.0 toolkit generates binaries that may not work on machines with a CUDA 2.3 device driver. For more information on CUDA toolkit versions and why you might want to change the default, refer to the article PGI Accelerator Programming Model Support on NVIDIA Fermi GPUs elsewhere in this issue of the PGInsider.

To let the compiler select which toolkit to use, set the NVIDIA: CUDA Toolkit property to Default; this is PVF's default setting for this property. To instruct the compiler to target a specific toolkit, select the desired toolkit version.

Accelerator Properties

NVIDIA Compute Capability Property

The CUDA hardware environment on an NVIDIA GPU-enabled system is defined by its compute capability. The hardware compute capability is basically a hardware revision number, and can be used to determine whether the device supports double-precision, how many threads can run concurrently, etc. The PGI Accelerator Fortran compiler supports all versions of NVIDIA's compute capability (1.0, 1.1, 1.2, 1.3 and 2.0). By default, the compiler will generate code that will work with all applicable compute capabilities supported in the targeted CUDA Toolkit. That's what the Automatic setting for this property means. If you want to manually select the compute capabilities for which the compiler generates code, set this property to Manual:

Accelerator Properties

If you don't see the additional compute capability properties (NVIDIA: CC 1.0, NVIDIA: CC 1.1, etc.) appear on the Target Accelerators property page after changing the NVIDIA: Compute Capability property to Manual, click Apply to update the properties.

For each compute capability version in the list, set its property to Yes if you want the compiler to generate code targeting that version. You can select as many of the compute capability versions as you need, and the compiler will use PGI Unified Binary technology to generate a single binary executable including kernels that can be launched on any of the specified types of GPU hardware.

Compute capability is discussed in more detail in the PGI Accelerator Programming Model Support on NVIDIA Fermi GPUs article mentioned above. Manually choosing the compute capability to target is a decision that must be based on knowledge of the environments in which your application will run.

CUDA Fortran

CUDA Fortran consists of a small set of extensions to Fortran, and is a direct analog to NVIDIA's CUDA C compiler. These extensions are built upon and support NVIDIA's CUDA computing architecture, which is a general purpose parallel programming architecture with compilers and libraries to support the programming of NVIDIA GPUs. For an introduction to CUDA Fortran, refer to the CUDA Fortran Programming Guide and Reference.

PGI Visual Fortran fully supports CUDA Fortran, including property pages for enabling and configuring CUDA Fortran compilation and support for CUDA Fortran files with the .cuf filename extension.

Language Property Page

To enable CUDA Fortran, first open the Fortran | Language property page:

Accelerator Properties

Then set the Enable CUDA Fortran property to Yes:

Accelerator Properties

If you don't see the additional CUDA Fortran properties appear on the Language property page after enabling CUDA Fortran, click Apply.

When CUDA Fortran is enabled, PVF adds the compiler flag ‑Mcuda to compilation and linking. Each of the sub-options to the ‑Mcuda option has a corresponding property on the Language property page. Most of these properties are straightforward and self-describing, and all of the CUDA Fortran properties are described in the PVF User's Guide.

If you've been following along throughout this article, then these properties probably look familiar. That's because the CUDA Fortran properties closely resemble the NVIDIA properties on the Target Accelerators page. In particular, the CUDA Fortran Toolkit and CUDA Fortran Compute Capability properties work the same way that the NVIDIA: CUDA Toolkit and NVIDIA: Compute Capability properties do.

CUDA Fortran Files

While any Fortran file can contain CUDA Fortran code, the .cuf file extension can be used to further designate CUDA Fortran files. You can add new .cuf files to a PVF project using the Add New Item dialog, and you can add existing .cuf files using the Add Existing Item dialog.

Although PVF recognizes .cuf files as CUDA Fortran, you still need to enable CUDA Fortran compilation via the property pages to ensure that the correct libraries are added to the link step.

Syntax Colorization

When displaying CUDA Fortran in the editor, PVF adds syntax colorization for CUDA Fortran keywords. Look for the shared and device keywords in the next screen shot:

CUDA Fortran Syntax Colorization in PVF

To take advantage of this feature, either the file in the editor must be a .cuf file or CUDA Fortran must be enabled in the project properties.

Diagnostic Properties

The PGI compilers will produce information about the optimizations they are attempting as they compile, but only if you enable this diagnostic information to be shown. PVF's Diagnostics property page is the entry point for doing so. When targeting accelerators, the Accelerator Information property is the first one you will want to enable.

Accelerator Diagnostic VS Properties

The compiler output will be shown in the Output Window. It will look similar to:

Compiler Output

Use this information as you continue to tune your code, analyzing the compiler feedback and modifying your code and loops as needed to enable generation of GPU accelerator kernels.

PGI Accelerator Environment Variables

There are several environment variables you can use to control the behavior of PGI Accelerator GPU-enabled programs at execution and modify the behavior of accelerator regions. Refer to the PVF User's Guide for a complete list of these. Note: these environment variables only affect the behavior of PGI Accelerator directive-based programs; in particular, they have no effect on the behavior of CUDA Fortran executables.

To set an environment variable within a PVF project, use the Debugging | Environment property. We'll use the ACC_NOTIFY environment variable as an example. When ACC_NOTIFY is set to a nonnegative integer, a message is printed to standard output when a kernel is executed on an accelerator. To set it in PVF, change the Environment property to read ACC_NOTIFY=1 as follows:

Compiler Output

When the application runs, the notification messages will be interleaved with standard output:

Compiler Output

The Debugging | Environment property can be used to set any environment variable including PATH. The ACC_DEVICE environment variable can be used to select the type of device on which to run, or to specify host execution. The ACC_DEVICE_NUM environment variable can be used to specify which device to use on a system with multiple GPUs installed. Full descriptions of these and other available PGI Accelerator applicable environment variables can be found in the PVF User's Guide

Conclusion

Once you are familiar with Visual Studio, PGI Visual Fortran provides an easy and intuitive programming environment for creating new Fortran 95/03 OpenMP and MPI applications or working on large existing applications. In addition to ease-of-use features, PVF provides the properties and options you need to extract maximum performance from Fortran applications on the latest multi-core x64 CPUs from Intel and AMD—vector SSE code generation, interprocedural optimization, automatic parallelization for multi-core, and full support for OpenMP and MPI debugging.

Now, in addition, PVF gives you an incremental path to GPU acceleration using PGI Accelerator directives and access to the full CUDA programming environment and API using PGI CUDA Fortran language extensions. I encourage you to give these features a try and send us your feedback as we continue to add capabilities to PVF and tune the PGI compilers to create a state-of-the-art Fortran programming environment for today's heterogeneous multi-core workstations, servers and clusters.