PGI 17.7 Compilers for Heterogeneous Supercomputing Now Available
September 14, 2017
Support for NVIDIA Volta GPUs, OpenACC interoperability with CUDA Unified Memory, and OpenMP 4.5 for multicore CPUs
Version 17.7 of the PGI® 2017 Compilers and Tools is now available, delivering improved performance and programming simplicity to high-performance computing (HPC) developers who target multicore CPUs and heterogeneous GPU-accelerated systems.
Available immediately, key new features of the PGI 17.7 Compilers & Tools include:
- Tesla V100 GPU support — PGI OpenACC and CUDA Fortran now support the new NVIDIA Volta GV100 GPU, offering more memory bandwidth, more streaming multiprocessors, next generation NVLink and new microarchitectural features that add up to better performance and programmability.
- OpenACC for CUDA Unified Memory — the PGI 17.7 compilers leverage CUDA Unified Memory to simplify OpenACC programming on GPU-accelerated systems. When OpenACC allocatable data is placed in CUDA Unified Memory using a simple compiler option, no explicit data movement code or directives are needed.
- OpenMP 4.5 for Multicore CPUs — Initial support for OpenMP 4.5 syntax and features allows the compilation of most OpenMP 4.5 programs for parallel execution across all the cores of a multicore CPU system. TARGET regions are implemented with default support for the multicore host as the target, and PARALLEL and DISTRIBUTE loops are parallelized across all OpenMP threads.
- Automatic Deep Copy of Fortran Derived Types — Movement of aggregate, or deeply nested Fortran data objects between CPU host and GPU device memory, including traversal and management of pointer-based objects, is now supported using OpenACC directives.
- C++ Enhancements — The PGI 17.7 C++ compiler includes incremental C++17 features, and is supported as a CUDA 9.0 NVCC host compiler. It delivers an average 20% performance improvement on the LCALS loops benchmarks.
- Use C++14 Lambdas with Capture in OpenACC Regions — C++ lambda expressions provide a convenient way to define anonymous function objects at the location where they are invoked or passed as arguments. Starting with the PGI 17.7 release, lambdas are supported in OpenACC compute regions in C++ programs, for example to drive code generation customized to different programming models or platforms. C++14 opens doors for more lambda use cases, especially for polymorphic lambdas. Those capabilities are now usable in OpenACC programs.
- Interoperability with the cuSOLVER Library — call optimized cuSolverDN routines from CUDA Fortran and OpenACC Fortran, C and C++ using the PGI-supplied interface module and the PGI-compiled version of the cuSOLVER library bundled with PGI 17.7.
- PGI Unified Binary for NVIDIA Tesla and Multicore CPUs — use OpenACC to build applications for both GPU acceleration and parallel execution on multicore CPUs. When run on a GPU-enabled system, OpenACC regions offload and execute on the GPU. When run on a system without GPUs installed, OpenACC regions execute in parallel across all CPU cores in the system.
- New Profiling features for CUDA Unified Memory and OpenACC — The PGI 17.7 Profiler adds new OpenACC profiling features including support on multicore CPUs with or without attached GPUs, and a new summary view that shows time spent in each OpenACC construct. New CUDA Unified Memory features include correlating CPU page faults with the source code lines where the associated data was allocated, support for new CUDA Unified Memory page thrashing, throttling and remote map events, NVLink support and more.
Other features and enhancements of PGI 17.7 include comprehensive support for environment modules on all supported platforms, prebuilt versions of popular open source libraries and applications, and new "Introduction to Parallel Computing with OpenACC" video tutorial series. See the PGI website for a complete list of PGI 17.7 features and capabilities.
PGI 17.7 is available for download today to all PGI Professional customers with active maintenance.
PGI Accelerator Compilers for POWER Architecture Enable Easy On-ramp to GPU Acceleration with POWER8 and NVIDIA NVLink
SALT LAKE CITY—SC16
November 15, 2016
NVIDIA® today announced general availability of the PGI Accelerator Fortran, C and C++ compiler suite, including support for OpenACC® directives-based parallel programming, for computer systems based on POWER CPUs, including IBM’s OpenPOWER LC servers that combine POWER8 CPUs with NVIDIA NVLINK with NVIDIA® Tesla® GPU accelerators.
With this latest release, PGI users can easily build and maintain large production HPC applications using the same source code, compiler options and build scripts on either multicore Linux/x86 or Linux/POWER CPUs, with or without GPU accelerators. This allows HPC developers to more effectively take advantage of and use multiple system architectures with support from optimizing compilers designed to deliver performance portability of applications across systems and an easy on-ramp to GPU programming.
"This release marks a milestone in our efforts to provide HPC developers with a means to port applications across all major CPU and accelerator platforms with uniformly high performance using a common source code base," said Douglas Miles, director of PGI Compilers & Tools at NVIDIA. "This capability is critical now that heterogeneous parallel computing platforms have become the norm, and as accelerated HPC system architectures continue to evolve complex memory hierarchies that in many cases must be managed either by the programmer or by a compiler."
In addition to Fortran 2003, C11 and C++14 language features, the new PGI compilers for OpenPOWER include PGI’s CPU and accelerator technologies and optimizations, including OpenMP 3.1, OpenACC 2.5 and CUDA® Fortran. PGI Accelerator for POWER also includes the PGPROF CPU+GPU performance profiler, a key component that enables performance analysis and optimization of accelerator-enabled applications. The PGI Accelerator compilers and tools for POWER are included with all PGI products for Linux systems, including the new no-cost PGI Community Edition.
"Easier programming methodologies like OpenMP and OpenACC are critical for the widespread adoption of GPU-accelerated systems," said Sumit Gupta, Vice President High Performance Computing & Data Analytics, IBM. "The new PGI compilers take advantage of the high-speed NVIDIA NVLink connection between the POWER8 CPU and the NVIDIA Pascal P100 GPU accelerators, along with the page migration engine, to make it much easier to accelerate and enhance performance of high performance computing and data analytics workloads."
Key benefits of PGI compilers for OpenPOWER include:
- Performance portability across CPU and CPU+GPU architectures
- Use a single source code base across x86 and OpenPOWER processor-based systems
- Support for OpenACC and CUDA Fortran on NVIDIA Tesla GPUs
- Support for the new POWER8 CPU-based systems with NVIDIA NVLink interconnect and NVIDIA Tesla P100 GPUs
The PGI compiler suite for OpenPower is among the available tools Oak Ridge National Laboratory will use to build and run large HPC applications on x86 CPUs, OpenPOWER CPUs and NVIDIA GPUs using the same source code base
"Porting HPC applications from one platform to another is a significant and challenging effort in the adoption of new hardware technologies," said Tjerk Straatsma, Scientific Computing Group Leader at Oak Ridge National Laboratory. "Architectural and performance portability like this is critical to our application developers and users as we move from existing CPU-only and GPU-enabled applications on machines like Titan to DOE’s upcoming major systems including the Summit system we’re installing at ORNL."
PGI is demonstrating the PGI Accelerator compilers for OpenPOWER in booth 2131 at SC16 in Salt Lake City, Nov. 14–17. Additional information is available at www.pgroup.com/openpower.
PGI Accelerator Compilers Add OpenACC Support for x86 Multicore CPUs
Santa Clara, Calif.
October 28, 2015
NVIDIA today announced availability of version 15.10 of the PGI Accelerator™ Fortran, C and C++ compilers, adding support for the OpenACC® directives-based parallel programming standard on x86 architecture multicore microprocessors.
The new PGI compilers deliver performance portability, allowing OpenACC-enabled source code to be compiled for parallel execution on a multicore CPU or a GPU accelerator. This capability provides tremendous flexibility for programmers, enabling them to develop applications that can take advantage of multiple system architectures with a single version of their source code.
"Our goal is to enable HPC developers to easily port applications across all major CPU and accelerator platforms with uniformly high performance using a common source code base," said Douglas Miles, director of PGI Compilers & Tools at NVIDIA. "This capability will be particularly important in the race towards exascale computing in which there will be a variety of system architectures requiring a more flexible application programming approach."
This new PGI feature compiles OpenACC compute regions for parallel execution across all of the cores in an x86 processor or multi-socket server. The cores are treated in aggregate as a shared-memory accelerator, eliminating all data movement overhead in the resulting OpenACC programs. By default the compiler generates code that uses all the available cores in the system, and several methods exist for programmers to control and fine-tune this behavior.
"We were extremely impressed that we can run OpenACC on a CPU with no code change and get equivalent performance to our OpenMP/MPI implementation, and get 4x faster performance when running on a GPU," said Wayne Gaudin of the U.K.’s Atomic Weapons Establishment. "From the perspective of performance portability and code future proofing, this is an excellent result."
Key benefits of running OpenACC on multicore CPUs include:
- Effective utilization of all cores of a multicore CPU or multi-socket server for parallel execution
- Common programming model across CPUs and GPUs in Fortran, C and C++
- Rapid exploitation of existing multicore parallelism in a program using the KERNELS directive, which enables incremental optimization for parallel execution
- Scalable performance across multicore CPUs and GPUs
"Porting HPC applications from one platform to another is one of the most significant costs in the adoption of breakthrough hardware technologies," said Buddy Bland, project director at Oak Ridge National Laboratory. "OpenACC for multicore x86 CPUs provides continuity and code portability from existing CPU-only and GPU-enabled applications from machines like Titan to all of DOE’s upcoming major systems as well as portability among those systems."
Growing Momentum for OpenACC
There are more than 10,000 developers using OpenACC today, and several recent developments underscore the continually growing adoption of OpenACC in high performance computing. At recent hackathons conducted worldwide, experts across a variety of scientific domains have been accelerating their scientific applications with accelerators and OpenACC. These include applications in such diverse fields as MRI image reconstruction (PowerGrid), computational fluid dynamics (INCOMP3D, HiPSTAR and Numeca), cosmology and astrophysics (RAMSES, CASTRO and MAESTRO), quantum chemistry (LSDALTON), computational physics (NekCEM) and more.
In addition, Gaussian, Inc. has announced that it is using OpenACC to port the GAUSSIAN computational chemistry application to accelerators. At the recent iCAS2 conference on climate and weather in Annecy, France, Meteosuisse, the Swiss Federal Office of Meteorology and Climatology, announced the deployment of a GPU-accelerated version of COSMO, the world’s first production weather forecasting application running on GPU accelerators.
In a recent poll of 150 OpenACC developers, 94 percent of the respondents reported getting a speedup when running on an accelerator, and over 90 percent of the users would recommend OpenACC.
Availability and Free Trial
PGI 15.10 with support for OpenACC on multicore CPUs is expected to be available this month directly from PGI and authorized resellers. New users can register for a free 90-day trial as part of the NVIDIA OpenACC Toolkit. University students and faculty can apply for a free PGI license.
PGI High Performance Computing Compilers Coming to IBM POWER Systems
Santa Clara, Calif.
November 18, 2014
Optimizing Compilers Enable Developers to Easily Develop and Migrate Linux x86 Applications to GPU-Accelerated POWER Systems
NVIDIA today announced that it is developing an enhanced version of the widely used PGI optimizing compilers which will allow developers to quickly develop new applications or run Linux x86-based GPU-accelerated applications on IBM POWER CPU systems with minimal effort.
The PGI optimizing Fortran, C and C++ compilers for POWER will provide a user interface, language features, parallel programming features and optimization capabilities that are identical to those available on PGI Linux and x86 compilers.
The new compilers also will support high performance computing (HPC) systems based on the IBM POWER architecture, including the recently announced NVIDIA GPU-accelerated IBM POWER8 systems, and additional systems under development by members of the OpenPOWER Foundation.
"“Our goal is to let HPC developers recompile and run their applications on all major CPU and GPU-accelerated platforms with uniformly high performance using a common source code base," said Douglas Miles, director of PGI Compilers & Tools at NVIDIA. "We expect most GPU-accelerated x86 applications currently built with PGI compilers will port to GPU-accelerated POWER systems with a simple re-compile."
The POWER8 CPU is a massively multi-threaded processor, featuring 12 cores each capable of handling eight hardware threads simultaneously. Originally positioned for big data and cloud server applications, the POWER8 architecture is generating strong demand from HPC customers given its many performance-oriented features, such as a high-bandwidth CAPI port (Coherent Accelerator Processor Interface) and future support for the NVLink high-speed GPU interconnect.
"Porting and optimizing production HPC applications from one platform to another can be one of the most significant costs in the adoption of breakthrough hardware technologies," said Buddy Bland, project director of the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory. "The PGI compiler has been our primary compiler on Jaguar and Titan since 2005. Having the PGI compiler suite available in the POWER environment will provide continuity and facilitate code portability of existing CPU-only and GPU-enabled Titan applications to our next major system."
"IBM’s Linux and x86 HPC customers have long had the luxury of leveraging the best capabilities and features from multiple HPC compiler solutions," said Dave Turek, vice president, Technical Computing OpenPOWER at IBM. "With the availability of PGI compilers alongside the widely used IBM XL optimizing compilers for POWER8, our customers will now have this same flexibility and advantage on current and next-generation IBM POWER System platforms as well."
Key features of the PGI compilers and tools for IBM POWER-based systems will include:
- OpenACC directives for accelerators – Comprehensive support for OpenACC features in the PGI Accelerator native Fortran 2003, C11 and C++11 compilers on the latest generation of GPU accelerators from NVIDIA, including support for unified memory.
- PGI CUDA Fortran extensions – Feature parity with CUDA Fortran on Linux/x86 platforms, offering the flexibility and power of the NVIDIA® CUDA® programming model in a native Fortran compiler for GPU-accelerated POWER systems.
- Faster OpenMP performance – PGI compilers deliver an average of 75 percent faster performance on the latest SPEC© OMP2012 benchmark suite, compared to GCC 4.8 using the latest AVX-enabled multi-core x64 processors from Intel and AMD.
- PGI optimization features – Fortran 2003, C11 and C++11 compilers with the full range of PGI multi-core optimizations including comprehensive loop optimizations, memory hierarchy optimizations, SIMD vectorization, function inlining, inter-procedural analysis and optimization, profile feedback and more.
For a complete list of the features and capabilities of PGI compilers and tools, visit https://www.pgroup.com/support/new_rel.htm.
Availability and Free Trial
NVIDIA will announce the availability of POWER support in the PGI compilers at a future date. PGI 2014 with x86 support is available today directly from NVIDIA and authorized resellers. New users can register for a free 30-day trial of PGI 2014 at www.pgroup.com.
PGI Accelerator Compilers Add Support for AMD APUs and GPUs
Santa Clara, Calif.
February 14, 2014
New Release Adds OpenACC 2.0 Features for NVIDIA and AMD GPU Accelerators, Delivers Multi-core x64 Performance Gains
PGI, a leading suite of high-performance parallelizing compilers and development tools, now features support for the latest version of the OpenACC programming standard on accelerator platforms.
Available today, PGI® 2014 Compilers and Tools includes new capabilities for programming the recently announced NVIDIA® Tesla® K40 GPU accelerators using version 2.0 features of the OpenACC directives-based parallel programming specification. It also provides, for the first time, OpenACC support for AMD Radeon GPUs and APUs.
"We applaud PGI’s ability to extract performance from AMD discrete GPUs and APUs using OpenACC," said Suresh Gopalakrishnan, corporate vice president and general manager of the Server business at AMD. "It will help break down the remaining barriers to wide-scale accelerator adoption, and decouple the choice of accelerator programming model from the choice of accelerator hardware."
Key features of PGI 2014 Compilers and Tools include:
- OpenACC 2.0 Features—PGI Accelerator native Fortran 2003, C99 and C++ compilers expand support for key OpenACC 2.0 features, including routine directive (procedure calls in accelerator regions), unstructured data lifetimes and others.
- New NVIDIA® CUDA® Fortran Extensions—Add support for version 5.5 of the NVIDIA CUDA parallel programming platform, CUDA atomic functions and device-side debugging using Allinea DTT and TotalView from Rogue Wave.
- Free PGI for OS X—Fortran 2003 and C99 compilers with all PGI multi-core x64 optimizations, command-line debugging and streamlined online documentation.
PGI 2014 compilers deliver an average of 75 percent faster performance on the latest SPEC® OMP2012 benchmark suite, compared to GCC using the latest AVX-enabled multi-core Intel and AMD x64 processors. Additional capabilities of PGI 2014 Compilers and Tools include full Fortran 2003 support, incremental Fortran 2008 features, updated libraries, support for the latest operating systems and a comprehensive suite of new and updated code examples and tutorials.
For a complete list of the features and capabilities of PGI 2014 Compilers and Tools, visit https://www.pgroup.com/support/new_rel.htm.
"The use of accelerators in high performance computing is now mainstream," said Douglas Miles, director of PGI Software at NVIDIA. "With PGI 2014, we are taking another big step toward our goal of providing platform-independent, multi-core and accelerator programming tools that deliver outstanding performance on multiple platforms without the need for extensive, device-specific tuning."
PGI 2014 is available today directly from NVIDIA and authorized resellers. A free 30-day trial of PGI 2014 is available for new users at www.pgroup.com. Registration is required.
PGI Accelerator Compilers Add Support for AMD APUs and GPUs
July 16, 2013
PGI Beta release supports OpenACC directive-based accelerator programming for AMD APUs and discrete GPUs
The Portland Group® (PGI), a wholly-owned subsidiary of STMicroelectronics and the leading independent supplier of compilers and tools for high-performance computing (HPC), today announced availability of a Beta release of the PGI Accelerator Fortran, C and C++ compilers with support for the OpenACC® API targeting AMD Accelerated Processing Units (APUs) and discrete Graphics Processing Units (dGPUs).
"One of PGI’s goals is to increase productivity and provide performance portability for applications developed and maintained by science and engineering domain experts," said Douglas Miles, Director of The Portland Group. "The OpenACC standard was developed in direct response to the HPC community’s interest for a vendor-neutral, platform-independent, directive-based accelerator programming model. Adding PGI Accelerator support for AMD APUs and GPUs is the latest step in the evolution of OpenACC and compiler technology for heterogeneous parallel computing at PGI."
Unveiled in November 2011, the OpenACC API was developed by PGI, Cray, and NVIDIA, with support from CAPS Entreprise. OpenACC is already supported by PGI compilers on NVIDIA© GPUs with the CUDA© parallel-programming architecture. The OpenACC 1.0 specification was developed in cooperation by the founding members and is based on the PGI Accelerator programming model. The OpenACC 2.0 specification has just recently been ratified.
The OpenACC Application Programming Interface (API) describes a collection of compiler directives to specify loops and regions of code in standard C, C++ and Fortran. These regions can be offloaded from a host CPU to an attached accelerator, providing portability across operating systems, host CPUs and accelerators. By exposing parallelism to the compiler, directives allow the compiler to do the detailed work of mapping the computation onto the accelerator to deliver significant improvements to application performance. Using directives, developers can have a single code base that is multi-platform and multi-vendor compatible, a key advantage for multi-platform and multi-generation application development.
In a recent poll of over 1200 OpenACC evaluators, over 70% of the respondents found OpenACC easy to use and more than 75% reported seeing a speed-up when running on an accelerator.
"AMD is very pleased with the beta release of the PGI Accelerator Fortran, C and C++ compilers with support for the OpenACC API that targets AMD APUs and discrete GPUs," said Margaret Lewis, director, server software planning at AMD. " OpenACC is being adopted by HPC researchers and programmers as they look for easier ways to take advantage of the benefits of accelerated computing. OpenACC provides a straight forward means for programmers to accelerate their applications using familiar programming techniques. It also provides a path for legacy applications to maintain code portability and still take advantage of the newest high-performance heterogeneous parallel computing architectures."
The first Beta release of the PGI Accelerator compilers with support for the OpenACC standard on AMD dGPU and APU platforms is available now on a limited basis, with an open Beta release currently scheduled for later in 2013. Interested Beta testers can request access by contacting PGI directly at firstname.lastname@example.org. The Beta software includes a restricted-use license and the license agreement is available at www.pgroup.com/support/BTLA.
More information on the PGI Accelerator compilers with OpenACC support is available at www.pgroup.com/accel. More information on the OpenACC API and standard can be found at www.openacc.org.
The Portland Group
The Portland Group Ships Major HPC Compilers and Development Tools Update
February 13, 2013
PGI 2013 delivers expanded support for programming HPC accelerators plus industry-leading multi-core x64 performance
The Portland Group® (PGI), a wholly-owned subsidiary of STMicroelectronics and the leading independent supplier of compilers and tools for high-performance computing, today announced that the 2013 release of the PGI® high-performance parallelizing compilers and development tools for Linux, Apple OS X and Microsoft Windows is now available. PGI 2013 includes new features and capabilities for programming the latest HPC accelerators using the OpenACC API. It also delivers significant performance gains on multi-core x64 processors.
"The high-performance computing landscape is evolving rapidly. With the recent introduction of new accelerators from NVIDIA, Intel and AMD, HPC users have more options than ever," said Douglas Miles, director of The Portland Group. "With PGI 2013, we are expanding support within our PGI Accelerator programming tools so developers wishing to access the huge potential performance of these new platforms can do so in a consistent, productive and portable way."
The 2013 release of the PGI Accelerator native Fortran 2003 and C99 compilers expands support for the OpenACC directive-based accelerator programming model through the addition of an all-new PGI Accelerator C++ compiler. All three compilers feature expanded support for the OpenACC standard as well as new PGI extensions for supporting multiple devices. PGI Accelerator compilers also now target the latest NVIDIA Tesla K20 and K20X GPUs. Support for targeting Intel Xeon Phi coprocessors and AMD APUs and discrete GPUs with OpenACC is planned for a future release. New CUDA Fortran extensions in PGI 2013 include support for textures as well as support for dynamic parallelism and separate compilation on suitable CUDA-enabled hardware. Both PGI Accelerator and CUDA Fortran now support the latest CUDA 5.0 software environment from NVIDIA in addition to supporting multiple devices from a single program or host thread.
In addition to expanded support for accelerators, PGI 2013 also delivers significantly faster performance on multi-core x64 processors including industry-leading OpenMP parallel performance on the new SPEC® OMP2012 benchmark suite* running on the latest AVX-enabled processors from AMD and Intel (see chart below). Overall performance on the SPEC CPU 2006 floating-point benchmarks is over 10% faster compared to the initial version of PGI 2012 released in February 2012. Similar performance gains have been seen on other HPC benchmarks as well.
Additional features and enhancements in PGI 2013 include:
- GNU 4.7 compatible C++ in an all-new compiler that comes complete with the full suite of PGI optimizations plus support for CUDA-x86, OpenMP and OpenACC.
- Fortran 2003 features added include recursive I/O, parameterized derived types, deferred type parameters and deferred character length.
- OpenMP 3.1 support including task yield and new atomic functions in C.
- PGI Visual Fortran© is now integrated with Visual Studio 2012 and includes the Visual Studio 2012 shell.
- PGDBG© parallel MPI/OpenMP graphical debugging tool has an improved interface for displaying source code including a new user configurable disassembly display.
PGI 2013 supports the latest operating system releases including Red Hat Enterprise Linux 6.3, Fedora 17, OpenSuSE 12, Ubuntu 12.10, Windows 8 and OS X Mountain Lion.
More information about the PGI Accelerator compilers is available online at www.pgroup.com/accelerate. PGI CUDA Fortran information is available separately at www.pgroup.com/cudafortran. CUDA-x86 information is also available at www.pgroup.com/cuda-x86. Evaluation copies of the new PGI 2013 compilers are available. Registration is required.
* SPEC OMP2012 is designed to provide a comparative measure of compute intensive performance across platforms. More information is available at www.spec.org. More information about PGI 2013 SPEC OMP2012 performance is available on the PGI website at www.pgroup.com/benchmark.
SPEC® is a registered trademark of the Standard Performance Evaluation Corporation (SPEC).
The Portland Group