On this page:
Sponsored by NVIDIA on 24 March 2010. Webinar slide deck is also available in PDF format.
Hosted by the IEEE Spectrum on 16 June 2009. Webinar slide deck is also available in PDF format.
"Compilers and More" series
by Michael Wolfe
Published by HPCwire
Published by Linux Journal magazine November 2008
This document describes the next generation features and capabilities planned for the PGI Accelerator programming model.
This document describes the currently supported features and limitations of the PGI Accelerator programming model.
This document describes a collection of compiler directives used to specify regions of code in Fortran and C programs that can be offloaded from a host CPU to an attached accelerator. The method outlined provides a model for accelerator programming that is portable across operating systems and various types of host CPUs and accelerators. The directives extend the ISO/ANSI standard C and Fortran base languages in a way that allows a programmer to migrate applications incrementally to accelerator targets using standards-compliant Fortran or C.
NVIDIA CUDA a general purpose parallel programming architecture with compilers and libraries to support the programming of NVIDIA GPUs. This document describes CUDA Fortran, a small set of extensions to Fortran that supports and is built upon the CUDA computing architecture.
CCFF is the Common Compiler Feedback Format, initially defined and implemented by PGI. PGI compilers add CCFF information to object and executable files that can be extracted into a file or read directly from the section of the object or executable file. The CCFF information is stored as an XML file, whose structure we describe here. See also CCFF XML schema and CCFF Repository XML schema
A review of options, issues and techniques for porting to Windows with particular attention on porting HPC programs to Windows HPC Server 2008. Includes review of different Windows development environment options. Discussion and examples for addressing migration issues including inter-language calling between Visual C++ and Fortran. Separate in-depth case studies look at using different build environments, porting MPI programs to MSMPI and porting shared objects. (Version 2.0, published September 2008.)
Fortran was one of the first high-level computer programming languages, and has been in use for over 50 years. During that time, a huge reservoir of HPC applications has been developed in Fortran. The Fortran language has evolved to support both performance-oriented programming and modern best practices features in programming languages and parallel computing, and many HPC applications have incorporated components in C and C++ as well. As a result, there is an incredible range and diversity of HPC applications in use today, most of which were developed on UNIX-heritage HPC and cluster platforms. This paper presents the compiler and tools resources available and shows practical examples of porting such codes for use on Windows HPC Server 2008 clusters. (Version 2.0, published September 2008.)
– Introduction slide deck 783KB PDF
– Advanced slide deck 602KB PDF
– Tutorial examples 17KB TAR
– Slide deck 967KB PDF
– Tutorial examples 555KB TAR
Course materials from Michael Wolfe's day long tutorial on programming GPUs using the directive-based PGI Accelerator compilers and CUDA Fortran.
– Tutorial examples and labs 25.5MB TAR
Presented at SC|08 in Austin on 18 November 2008.
PGI Fortran, C and C++ compilers and tools are supported on most x64 processor-based systems. Optimizing performance of the x64 processors in these systems often depends on maximizing SSE vectorization, ensuring alignment of vectors, and minimizing the number of cycles the processors are stalled waiting on data from main memory. The PGI compilers support a number of directives and options that allow the programmer to control and guide optimizations including vectorization, parallelization, function inlining, memory prefetching, interprocedural optimization, and others. In this paper we provide detailed examples of the use of several of these features as a means for extracting maximum single-node performance from x64 processor-based systems using PGI compilers and tools.
Tuning numerically intensive C++ applications for maximum performance can be a challenge. This paper illustrates the importance of SSE vectorization on modern processors, and uses the ALEGRA shock physics code as an example of how a C++ application can be re-structured to enable vectorization and other optimizations that lead to dramatic performance improvements.
Presented at SC|07 in Reno on 14 November 2007.
Presented at SC|06 in Tampa on 21 November 2006.
Keynote address at CASES 2005 San Franciso, 25 September 2005