Use PGI Fortran, C and C++ compilers to develop performance-portable applications for multicore x86-64 or OpenPOWER CPUs, and GPUs from NVIDIA. PGI compilers support Fortran 2003, C++14 and selected C++17 features. With PGI you can parallelize programs for multicore CPUs and NVIDIA GPUs using OpenACC 2.6, for multicore CPUs using OpenMP 4.5, and for NVIDIA GPUs using CUDA Fortran. PGI compilers are used on the world’s fastest computers, including on the Top 500 #1 Summit Supercomputer at Oak Ridge National Lab for GPU-accelerated CFD, quantum chemistry, weather and climate, molecular dynamics, and astrophysics applications. PGI compilers are for scientists and engineers using computing systems ranging from workstations to the fastest GPU-powered supercomputers.

SPEC ACCEL Performance Comparision
Click chart for details.

PGI compilers deliver the performance you need on CPUs, and the features you need for HPC applications development on GPU-accelerated systems. OpenACC and CUDA programs can run several times faster on a single Tesla V100 GPU compared to all the cores of a dual-socket server, and interoperate with MPI and OpenMP to deliver the full power of today’s multi-GPU servers.

One OpenACC Code base for multiple targets

Is your application 10s or 100s of thousands of lines of Fortran, C and C++ code? With OpenACC directives, you don’t have to parallelize all of it at once. You can identify hot loops and code regions using the PGPROF profiler, then incrementally parallelize and tune them one by one. OpenACC code remains 100% standard-compliant and portable to other compilers and platforms, and enables parallel processing on CPUs and GPUs using identical source code.

CloverLeaf Performance

CloverLeaf Performance Comparison
Click chart for details.

CloverLeaf, a Lagrangian-Eulerian explicit hydrodynamics mini-application, is a small (4,500 line) lightweight application that is representative of a code used at the United Kingdom’s Atomic Weapons Establishment (AWE). Using OpenACC, performance on an NVIDIA V100 GPU is four times faster than a dual-socket 40-core Intel Skylake CPU, running the fully optimized code on the bm32 data set. It scales to almost 15 times faster on 4xV100s using MPI+OpenACC. The optimizations to the source code made during porting to the GPU using OpenACC improved the performance of the CPU code by more than 50%.

PGI Support All Major HPC Platforms

HPC servers are quickly expanding beyond multicore x86 CPUs to OpenPOWER, Arm and GPU accelerators. PGI Fortran, C and C++ compilers and OpenACC are designed to deliver high performance on all of these processors. PGI compilers for x86, OpenPOWER and GPUs are available now, including OpenACC parallelization across all cores of a multicore CPU or a GPU. PGI and OpenACC deliver the performance you need today, and the flexibility you need tomorrow. PGI compilers can take you there.

PGI Profiler GUI

The PGI Profiler is a powerful and easy-to-use interactive performance profiler for parallel programs written with OpenMP or OpenACC directives, or using CUDA. Use it to visualize and analyze the performance of your Fortran, C and C++ programs. The PGI Profiler can correlate execution time with procedures, source code and instructions, allowing you to quickly see where and how execution time is spent. Through resource utilization data and compiler feedback information, the PGI Profiler provides features that will help you understand why parts of your program have high execution times and how you can modify your source code or compiler options to improve performance. The PGI Profiler is included with all PGI products.

Mike Frisch, President, Gaussian, Inc.
Using OpenACC allowed us to continue development of our fundamental algorithms and software capabilities simultaneously with the GPU-related work. In the end, we could use the same code base for SMP, cluster/ network and GPU parallelism. PGI's compilers were essential to the success of our efforts.
Mike Frisch, PhDPresident and CEO
Gaussian, Inc.
Dr. Georg Kresse, University of Vienna
For VASP, OpenACC is the way forward for GPU acceleration. Performance is similar and in some cases better than CUDA C, and OpenACC dramatically decreases GPU development and maintenance efforts. We’re excited to collaborate with NVIDIA and PGI as an early adopter of CUDA Unified Memory.
Dr. Georg KresseUniv.-Prof. Dipl. Ing.
Univeristy of Vienna
Sunil Sathe, Lead Software Develooper, ANSYS Fluent
We’ve effectively used OpenACC for heterogeneous computing in ANSYS Fluent with impressive performance. We’re now applying this work to more of our models and new platforms.
Sunil SatheLead Software Developer
ANSYS Fluent
Dr. Richard Loft, Dir. of Technology Develoopment, NCAR
Our team has been evaluating OpenACC as a pathway to performance portability for the Model for Prediction (MPAS) atmospheric model. Using this approach on the MPAS dynamical core, we have achieved performance on a single P100 GPU equivalent to 2.7 dual socketed Intel Xeon nodes on our new Cheyenne supercomputer.
Dr. Richard LoftDirector, Technology Development
David Gutzwiller, Lead Software Develooper, NUMECA
Porting our unstructured C++ CFD solver FINE/Open to GPUs using OpenACC would have been impossible two or three years ago, but OpenACC has developed enough that we’re now getting some really good results.
David GutzwillerLead Software Developer
Filippo Spiga, Head of Research Software Engineering, University of Cambridge
CUDA Fortran gives us the full performance potential of the CUDA programming model. While leveraging the potential of explicit data movement, !$CUF KERNELS directives give us productivity and source code maintainability. It’s the best of both worlds.
Filippo SpigaSenior Contributor
Quantum ESPRESSO Group
Click me
Cookie Consent

This site uses cookies to store information on your computer. See our cookie policy for further details on how to block cookies.