PGPROF® is a powerful and simple-to-use interactive postmortem statistical analyzer for MPI process-parallel and OpenMP thread-parallel programs as well as programs incorporating OpenACC directives and CUDA Fortran. Use PGPROF to visualize and diagnose the performance of the components of your program. PGPROF associates execution time with the source code and instructions of your program, allowing you to see where and how execution time is spent. Through resource utilization data and compiler feedback information, PGPROF also provides features for helping you to understand why certain parts of your program have high execution times.
Use PGPROF to analyze programs on multicore SMP servers, distributed-memory clusters and hybrid clusters where each node contains both multicore x64 processors and accelerators. PGPROF can profile parallel programs, including multiprocess MPI programs, multi-threaded OpenMP programs, or a combination of both. PGPROF allows profiling at the function, source code line, and assembly instruction level for PGI-compiled Fortran, C and C++ programs. PGPROF provides views of the performance data for analysis of MPI communication, multiprocess and multi-thread load balancing, and scalability.
Using the Common Compiler Feedback Format (CCFF), PGI compilers save information about how your program was optimized, or why a particular optimization was not made. PGPROF can extract this information and associate it with source code and other performance data, allowing you to view all of this information simultaneously. PGPROF also supports a feedback-only mode, which allows you to browse compiler feedback associated with a CCFF-enabled binary executable in the absence of a performance profile.
PGPROF provides the information required to determine which functions and lines in an application are consuming the most execution time. Combined with the feedback features of the PGI compilers, PGPROF will enable you to maximize vectorization and performance on a single x64 processor core. PGPROF exposes performance bottlenecks in a cluster application by presenting the number of calls, aggregate message size and execution time of individual MPI function calls on a line by line basis. On GPUs, PGPROF reports performance critical information including initialization, data transfer and kernel execution times.
Using PGPROF, you can merge profiles from multiple runs on different numbers of nodes to perform scalability analysis on your MPI or OpenMP application at the application, function or line level. Scalability analysis allows you to quickly see which parts of your application are barriers to scalable performance, and where your parallel tuning efforts should be focused.
Performance data from your application can be collected in a number of ways. Use the pgcollect tool for basic execution-time profiling on x64 host processors and for profiling CUDA Fortran and OpenACC applications running on GPU accelerators. For more specialized needs, PGPROF supports instrumentation-based profiling and sample-based profiling including time-based sampling and event-based sampling using CPU hardware counters.
Analyzing a parallel application can be extremely challenging. PGPROF provides a comprehensive set of graphical user interface (GUI) elements to assist. The PGPROF GUI displays information in familiar easy-to-use formats such as bar-charts, percentages, counts or seconds. PGPROF also supports visualizing a profile using graphical histograms.
With PGPROF, quickly determine where execution time is spent and see which functions were called and how often. PGPROF offers a combined view showing both GPU and x64 host performance for an application. Use PGPROF to quickly analyze MPI Sends, MPI Receives and other MPI communication. Examine performance statistics and timings for parallel programs on a per-thread or per-process basis. PGPROF supports function, instruction and source-line level profiling. PRGPROF can even be used to effectively profile optimized code at the block level using PGI's unique instrumentation or a sample-based gprof style methodology. PGPROF's scalability comparison feature on Linux provides a reliable low overhead means to measure linear speed-up or slow-down between multiple executions of an application.
PGPROF complements PGI's powerful MPI and OpenMP parallel graphical debugger PGDBG®.
A partial list of technical features supported includes the following:
PGPROF OpenMP & MPI Profiler
(included with PGI Workstation class and PGI Server class products and the PGI CDK)