Why Choose PGI
Parallelizing Compilers for CPUs and GPUs
Use PGI Fortran, C and C++ compilers to develop performance-portable applications for multicore x86-64 or OpenPOWER CPUs, and GPUs from NVIDIA. PGI compilers support Fortran 2003, C++14 and selected C++17 features. With PGI you can parallelize programs for multicore CPUs and NVIDIA GPUs using OpenACC 2.6, for multicore CPUs using OpenMP 4.5, and for NVIDIA GPUs using CUDA Fortran. PGI compilers are used on the world’s fastest computers, including on the Top 500 #1 Summit Supercomputer at Oak Ridge National Lab for GPU-accelerated CFD, quantum chemistry, weather and climate, molecular dynamics, and astrophysics applications. PGI compilers are for scientists and engineers using computing systems ranging from workstations to the fastest GPU-powered supercomputers.
World-class CPU Performance, GPU Acceleration
PGI compilers deliver the performance you need on CPUs, and the features you need for HPC applications development on GPU-accelerated systems. OpenACC and CUDA programs can run several times faster on a single Tesla V100 GPU compared to all the cores of a dual-socket server, and interoperate with MPI and OpenMP to deliver the full power of today’s multi-GPU servers.
Accelerate Your Code with OpenACC
Is your application 10s or 100s of thousands of lines of Fortran, C and C++ code? With OpenACC directives, you don’t have to parallelize all of it at once. You can identify hot loops and code regions using the PGPROF profiler, then incrementally parallelize and tune them one by one. OpenACC code remains 100% standard-compliant and portable to other compilers and platforms, and enables parallel processing on CPUs and GPUs using identical source code.
Performance Portability Delivered
CloverLeaf, a Lagrangian-Eulerian explicit hydrodynamics mini-application, is a small (4,500 line) lightweight application that is representative of a code used at the United Kingdom’s Atomic Weapons Establishment (AWE). Using OpenACC, performance on an NVIDIA V100 GPU is four times faster than a dual-socket 40-core Intel Skylake CPU, running the fully optimized code on the bm32 data set. It scales to almost 15 times faster on 4xV100s using MPI+OpenACC. The optimizations to the source code made during porting to the GPU using OpenACC improved the performance of the CPU code by more than 50%.
Will Your Compiler Take You There?
HPC servers are quickly expanding beyond multicore x86 CPUs to OpenPOWER, Arm and GPU accelerators. PGI Fortran, C and C++ compilers and OpenACC are designed to deliver high performance on all of these processors. PGI compilers for x86, OpenPOWER and GPUs are available now, including OpenACC parallelization across all cores of a multicore CPU or a GPU. PGI and OpenACC deliver the performance you need today, and the flexibility you need tomorrow. PGI compilers can take you there.
Performance Profiling and Optimization
The PGI Profiler is a powerful and easy-to-use interactive performance profiler for parallel programs written with OpenMP or OpenACC directives, or using CUDA. Use it to visualize and analyze the performance of your Fortran, C and C++ programs. The PGI Profiler can correlate execution time with procedures, source code and instructions, allowing you to quickly see where and how execution time is spent. Through resource utilization data and compiler feedback information, the PGI Profiler provides features that will help you understand why parts of your program have high execution times and how you can modify your source code or compiler options to improve performance. The PGI Profiler is included with all PGI products.
What People Are Saying About PGI
“Using OpenACC allowed us to continue development of our fundamental algorithms and software capabilities simultaneously with the GPU-related work. In the end, we could use the same code base for SMP, cluster/ network and GPU parallelism. PGI's compilers were essential to the success of our efforts.“Mike Frisch, PhDPresident and CEO
“For VASP, OpenACC is the way forward for GPU acceleration. Performance is similar and in some cases better than CUDA C, and OpenACC dramatically decreases GPU development and maintenance efforts. We’re excited to collaborate with NVIDIA and PGI as an early adopter of CUDA Unified Memory.“Dr. Georg KresseUniv.-Prof. Dipl. Ing.
Univeristy of Vienna
“We’ve effectively used OpenACC for heterogeneous computing in ANSYS Fluent with impressive performance. We’re now applying this work to more of our models and new platforms.“Sunil SatheLead Software Developer
“Our team has been evaluating OpenACC as a pathway to performance portability for the Model for Prediction (MPAS) atmospheric model. Using this approach on the MPAS dynamical core, we have achieved performance on a single P100 GPU equivalent to 2.7 dual socketed Intel Xeon nodes on our new Cheyenne supercomputer.“Dr. Richard LoftDirector, Technology Development
“Porting our unstructured C++ CFD solver FINE/Open to GPUs using OpenACC would have been impossible two or three years ago, but OpenACC has developed enough that we’re now getting some really good results.“David GutzwillerLead Software Developer
“CUDA Fortran gives us the full performance potential of the CUDA programming model. While leveraging the potential of explicit data movement, !$CUF KERNELS directives give us productivity and source code maintainability. It’s the best of both worlds.“Filippo SpigaSenior Contributor
Quantum ESPRESSO Group