PGPROF Graphical Performance Profiler

Performance Profile Parallel MPI and OpenMP Applications

PGPROF® is a powerful and simple-to-use interactive postmortem statistical analyzer for MPI process-parallel and OpenMP thread-parallel programs as well as programs incorporating OpenACC directives and CUDA Fortran. Use PGPROF to visualize and diagnose the performance of the components of your program. PGPROF associates execution time with the source code and instructions of your program, allowing you to see where and how execution time is spent. Through resource utilization data and compiler feedback information, PGPROF also provides features for helping you to understand why certain parts of your program have high execution times.

Use PGPROF to analyze programs on multicore SMP servers, distributed-memory clusters and hybrid clusters where each node contains both multicore x64 processors and accelerators. PGPROF can profile parallel programs, including multiprocess MPI programs, multi-threaded OpenMP programs, or a combination of both. PGPROF allows profiling at the function, source code line, and assembly instruction level for PGI-compiled Fortran, C and C++ programs. PGPROF provides views of the performance data for analysis of MPI communication, multiprocess and multi-thread load balancing, and scalability.

Using the Common Compiler Feedback Format (CCFF), PGI compilers save information about how your program was optimized, or why a particular optimization was not made. PGPROF can extract this information and associate it with source code and other performance data, allowing you to view all of this information simultaneously. PGPROF also supports a feedback-only mode, which allows you to browse compiler feedback associated with a CCFF-enabled binary executable in the absence of a performance profile.

View full size image

PGPROF provides the information required to determine which functions and lines in an application are consuming the most execution time. Combined with the feedback features of the PGI compilers, PGPROF will enable you to maximize vectorization and performance on a single x64 processor core. PGPROF exposes performance bottlenecks in a cluster application by presenting the number of calls, aggregate message size and execution time of individual MPI function calls on a line by line basis. On GPUs, PGPROF reports performance critical information including initialization, data transfer and kernel execution times.

Using PGPROF, you can merge profiles from multiple runs on different numbers of nodes to perform scalability analysis on your MPI or OpenMP application at the application, function or line level. Scalability analysis allows you to quickly see which parts of your application are barriers to scalable performance, and where your parallel tuning efforts should be focused.

Performance data from your application can be collected in a number of ways. Use the pgcollect tool for basic execution-time profiling on x64 host processors and for profiling CUDA Fortran and OpenACC applications running on GPU accelerators. For more specialized needs, PGPROF supports instrumentation-based profiling and sample-based profiling including time-based sampling and event-based sampling using CPU hardware counters.

Powerful GUI
Analyzing a parallel application can be extremely challenging. PGPROF provides a comprehensive set of graphical user interface (GUI) elements to assist. The PGPROF GUI displays information in familiar easy-to-use formats such as bar-charts, percentages, counts or seconds. PGPROF also supports visualizing a profile using graphical histograms.

With PGPROF, quickly determine where execution time is spent and see which functions were called and how often. PGPROF offers a combined view showing both GPU and x64 host performance for an application. Use PGPROF to quickly analyze MPI Sends, MPI Receives and other MPI communication. Examine performance statistics and timings for parallel programs on a per-thread or per-process basis. PGPROF supports function, instruction and source-line level profiling. PRGPROF can even be used to effectively profile optimized code at the block level using PGI's unique instrumentation or a sample-based gprof style methodology. PGPROF's scalability comparison feature on Linux provides a reliable low overhead means to measure linear speed-up or slow-down between multiple executions of an application.

PGPROF complements PGI's powerful MPI and OpenMP parallel graphical debugger PGDBG®.

Technical Features

A partial list of technical features supported includes the following:

PGPROF OpenMP & MPI Profiler
(included with PGI Workstation class and PGI Server class products and the PGI CDK)

  • Profile Fortran, C and C++ programs
  • For 32-bit and 64-bit multi-core processor-based systems with or without accelerators
  • Supports process-level MPI profiling, thread-level OpenMP profiling and hybrid combinations of MPI and OpenMP profiling
  • Supports profiling OpenACC and CUDA Fortran codes on NVIDIA CUDA-enabled GPU accelerators and profiling OpenACC codes on Radeon accelerators
  • Supports thread-level OpenMP profiling up to a maximum of 64 threads per process.
  • Supports MPICH, MVAPICH2, Open MPI, SGI-MPI and MS-MPI.
  • Graphical and command-line user interfaces
  • Function level (routine), assembly instruction level and source code line level profiling
  • Thread profiling
  • MPI communication profiling
  • Display detailed system configuration information used to create the profile
  • Measure scalability between multiple execution runs with varying number of processes/threads
  • Multiple sortable display formats
    • Histograms
    • Percentage
    • Bar Charts
    • Counts
    • Time in seconds
  • Display collective MPI Sends and Receives
  • Collect and display hardware performance counter data on systems with oprofile installed
  • Support gprof-style trace files
  • Comprehensive build-in help facilities

System Requirements

  • Hardware: 64-bit x64 or 32-bit x86 processor-based workstation or server with one or more single core or multi-core AMD64 or Intel 64 microprocessors.
  • Operating System:
    • Linux: OpenMP and MPI profiling is supported on any Linux operating system with kernel revision 2.2.10 or newer. PGPROF is fully interoperable with versions of Linux which use kernel revision 2.4 and glibc 2.3.2 or newer.
    • OS X: OpenMP and MPI profiling are supported on 64-bit and 32-bit Mac OS X 10.6.x (Snow Leopard) operating system or later.
    • Windows: MS-MPI profiling is supported on Microsoft Windows 7, 8, 8.1, Server 2008 R2 and Server 2012. The Microsoft HPC Pack 2012 MS-MPI Redistributable Pack is included with PGI products for these operating systems. OpenMP profiling is supported on Microsoft Windows XP and later operating systems.
  • Memory: Minimum 128 MB recommended.
  • Hard Disk: 400 MB.
  • Display: Requires a minimum of 800 x 600 resolution monitor.
  • Peripherals: Mouse or compatible pointing device for use of optional graphical user interfaces.
  • Other: Adobe Acrobat Reader for viewing documentation.
Click me