PGI Fortran, C and C++ compilers and tools are supported on most x64 processor-based systems. Optimizing performance of the x64 processors in these systems often depends on maximizing SSE vectorization, ensuring alignment of vectors, and minimizing the number of cycles the processors are stalled waiting on data from main memory. The PGI compilers support a number of directives and options that allow the programmer to control and guide optimizations including vectorization, parallelization, function inlining, memory prefetching, interprocedural optimization, and others. In this paper we provide detailed examples of the use of several of these features as a means for extracting maximum single-node performance from x64 processor-based systems using PGI compilers and tools.
Tuning numerically intensive C++ applications for maximum performance can be a challenge. This paper illustrates the importance of SSE vectorization on modern processors, and uses the ALEGRA shock physics code as an example of how a C++ application can be re-structured to enable vectorization and other optimizations that lead to dramatic performance improvements.
Presented at SC|07 in Reno on 14 November 2007.
Presented at SC|06 in Tampa on 21 November 2006.
Keynote address at CASES 2005 San Franciso, September 25, 2005
Published by HPCwire
Published in Dr. Dobbs Journal August 2005