PGPROF Graphical Performance Profiler

Performance Profile Parallel MPI and OpenMP Applications

PGPROF® is a powerful and simple-to-use interactive postmortem statistical analyzer for MPI-parallel and OpenMP thread- parallel programs. Use PGPROF to visualize and diagnose the performance of the components of your program. Using tables and graphs, PGPROF associates execution time with the source code and instructions of your program, allowing you to see where and how execution time is spent. Through resource utilization data and compiler feedback information, PGPROF also provides features for helping you to understand why certain parts of your program have high execution times.

PGPROF complements PGI's powerful MPI and OpenMP parallel graphical debugger PGDBG®.

Use PGPROF to analyze programs on multicore SMP Servers, distributed-memory clusters and hybrid clusters where each node contains multicore x64 processors. PGPROF can profile parallel programs, including multiprocess MPI programs, multi-threaded OpenMP programs, or a combination of both. PGPROF allows profiling at the function, source code line, and assembly instruction level for PGI-compiled Fortran, C and C++ programs. PGPROF provides views of the performance data for analysis of MPI communication, multiprocess and multi-thread load balancing, and scalability.

Using the Common Compiler Feedback Format (CCFF), PGI compilers save information about how your program was optimized, or why a particular optimization was not made. PGPROF can extract this information and associate it with source code and other performance data, allowing you to view all of this information simultaneously. PGPROF also supports a feedbackonly mode, which allows you to browse compiler feedback associated with a CCFF-enabled binary executable in the absence of a performance profile.

Watch the PGPROF New Features video (time: 7:50)


View full size image

PGPROF provides the information required to determine which functions and lines in an application are consuming the most execution time. Combined with the feedback features of the PGI compilers, PGPROF will enable you to maximize vectorization and performance on a single x64 processor core. PGPROF exposes performance bottlenecks in a cluster application by presenting the number of calls, aggregate message size and execution time of individual MPI function calls on a line by line basis.


View full size image

In the figure above, the 'Scale' column shows that some functions like f_nonbon scale at about 1/2 linear speedup, while others like a_next slow down when run with an increased number of threads. The 'Parallelism' table below shows that execution of mm_fv_update_nonbon is not perfectly balanced between threads, with thread zero spending 33% of the time in that routine, but thread three spending only 20%.

Using PGPROF, you can merge profiles from multiple runs on different numbers of nodes to perform scalability analysis on your MPI or OpenMP application at the application, function or line level. Scalability analysis allows you to quickly see which parts of your application are barriers to scalable performance, and where your parallel tuning efforts should be focused. PGPROF, displays information in easy-to-use formats such as bar-charts, percentages, counts or seconds and displays profiles using graphical histograms.

Performance data from your application can be collected in a number of ways. Use the pgcollect tool for basic execution-time profiling. For more specialize needs, PGPROF supports Instrumentation-based profiling and Sample-based profiling including time-based sampling and event-based sampling using hardware counters.

Powerful GUI
Analyzing a parallel application can be extremely challenging. PGPROF provides a comprehensive set of graphical user interface (GUI) elements to assist. The PGPROF GUI displays information in familiar easy-to-use formats such as bar-charts, percentages, counts or seconds. PGPROF also supports visualizing a profile using graphical histograms.

With PGPROF, quickly determine where execution time is spent and see which functions were called and how often. Use the PGPROF to quickly analyze MPI Sends, MPI Receives and other MPI communication. Information on time spent in thread-parallel regions is also readily accessible. PGPROF supports function, instruction and source-line level profiling. PRGPROF can even be used to effectively profile optimize code at the block level using PGI's unique instrumentation or a sample-based gprof style methodology. PGPROF's scalability comparison feature using hardware counters on Linux provides a reliable low overhead means to measure linear speed-up or slow-down between multiple executions of an application.

Technical Features

A partial list of technical features supported includes the following:

PGPROF OpenMP & MPI Profiler
(included with PGI Workstation class and PGI Server class products and the PGI CDK)

  • Profile Fortran77, F95, C and C++ programs
  • For 32-bit and 64-bit multi-core processor-based systems
  • Supports process-level MPI profiling, thread-level OpenMP profiling and hybrid combinations of MPI and OpenMP profiling
  • Supports thread-level OpenMP profiling up to a maximum of 64 threads per process.
  • Supports MPICH-1, MPICH-2, MVAPICH, OpenMPI, HP-MPI and MSMPI.
  • Graphical and command-line user interfaces
  • Function level (routine), assembly instruction level and source code line level profiling
  • Thread profiling
  • Sample based MPI profiling
  • MPI communication profiling
  • Measure scalability between multiple execution runs with varying number of processes/threads
  • Multiple sortable display formats
    • Histograms
    • Percentage
    • Bar Charts
    • Counts
    • Time in seconds
  • Display collective MPI Sends and Receives
  • Supports profiling using hardware counters (Linux only)
  • Collect and display hardware performance counter data on systems with oprofile installed
  • Support gprof-style trace files
  • Comprehensive build-in help facilities

System Requirements

  • Hardware: 64-bit x64 or 32-bit x86 processor-based workstation or server with one or more single core or multi-core AMD64 or Intel 64 microprocessors.
  • Operating System:
    • Linux: OpenMP and MPI profiling is supported on any Linux operating system with kernel revision 2.2.10 or higher. PGPROF is fully interoperable with versions of Linux which use kernel revision 2.4 and glibc 2.3.2 or higher.
    • Mac OS X: OpenMP and OpenMPI profiling are supported on 64-bit and 32-bit Mac OS X 10.5.x (Leopard) operating system.
    • Windows: OpenMP and MSMPI profiling are supported on 64-bit (Vista, XP Professional x64 Edition, Server 2008 (x64) or Server 2003 x64 Edition) and on 32-bit (Vista, XP Pro, Server 2008 (x86) or Server 2003) Microsoft Windows operating systems with the optional Windows HPC Pack 2008 SDK.
  • Memory: Minimum 128 MB recommended.
  • Hard Disk: 400 MB.
  • Display: Requires a minimum of 800 x 600 resolution monitor.
  • Peripherals: Mouse or compatible pointing device for use of optional graphical user interfaces. Optional CD-ROM disk drive for installation.
Click me