PGPROF Graphical Performance Profiler

Performance Profile Parallel MPI and OpenMP Cluster Applications

PGPROF® is an interactive, powerful and simple-to-use postmortem statistical analyzer for MPI-parallel and OpenMP thread-parallel programs running on Linux or Microsoft Windows clusters. Use PGPROF to analyze programs on SMP Servers, distributed-memory clusters and hybrid clusters where each node contains multiple 64-bit or 32-bit multi-core processors. PGPROF allows profiling at the function, instruction and source-code line level for F77, F95, HPF, C and C++.

PGPROF complements PGI's powerful MPI and OpenMP parallel graphical cluster debugger PGDBG®.

PGPROF is included with all PGI product except for PGI Visual Fortran™ Standard Edition. PGPROF is available in two versions. The PGPROF version included with PGI Workstation class and PGI Server class products for Linux and with the PGI® CDK™ Cluster Development Kit® for Linux and for Microsoft Windows is configured for analyzing both MPI-parallel and OpenMP thread-parallel applications. The version of PGPROF included with the PGI Workstation class and PGI Server class product for Mac OS X and for Microsoft Windows, is configured for OpenMP thread-parallel performance profiling and analysis.


View full size image

Powerful GUI
Analyzing a cluster application can be extremely challenging. PGPROF provides a comprehensive set of graphical user interface (GUI) elements to assist. The PGPROF GUI displays information in intuitive easy-to-use formats such as bar-charts, percentages, counts or seconds. PGPROF also supports visualizing a profile using graphical histograms.

With PGPROF, quickly determine where execution time is spent and see which functions were called and how often. Use the PGPROF to quickly analyze MPI Sends, MPI Receives and other MPI communication. Information on time spent in thread-parallel regions is also readily accessible. PGPROF supports function, instruction and source-line level profiling. PRGPROF can even be used to effectively profile optimize code at the block level using PGI's unique instrumentation or a sample-based gprof style methodology.

Workflow
Following is a typical process workflow for quickly and easily finding the hotspots or kernels of an application.

  1. Sort application at the function-level by Time or Cost;
  2. Drill into the critical function to display the profiled source code;
  3. Sort source code at the line-level by Time or Cost; and
  4. View the hotspot.

Technical Features

A partial list of technical features supported includes the following:

PGPROF OpenMP & MPI Cluster Profiler
(included with PGI Workstation class and PGI Server class products for Linux and the PGI CDK)

  • Profile Fortran77, F95, C and C++ programs
  • For 32-bit and 64-bit multi-core processor-based systems
  • Supports process-level MPI profiling, thread-level OpenMP profiling and hybrid combinations of MPI and OpenMP profiling
  • GUI or command-line profiling
  • Function level (routine), assembly instruction level and source code line level profiling
  • Thread profiling
  • Sample based MPI profiling
  • MPI communication profiling
  • Measure scalability between multiple execution runs with varying number of processes/threads
  • Multiple sortable display formats
    • Histograms
    • Percentage
    • Bar Charts
    • Counts
    • Time in seconds
    • Absolute value
  • Display collective MPI Sends and Receives
  • Supports hardware counters based profiling using PAPI
  • Collect and display hardware performance counter data on systems with oprofile installed
  • Support gprof-style trace files
  • Comprehensive build-in help facilities

PGPROF OpenMP Profiler
(included with PGI Workstation class and PGI Server class products for Mac OS X and for Microsoft Windows including PVF Workstation Complete)

  • Supports thread-level OpenMP profiling.

System Requirements

  • Hardware: 64-bit x64 or 32-bit x86 processor-based workstation or server with one or more single core or multi-core AMD Opteron or Intel Core 2 microprocessors.
  • Operating System: OpenMP and MPI profiling is supported on any Linux operating system with kernel revision 2.2.10 or higher. PGPROF is fully interoperable with versions of Linux which use kernel revision 2.4 and glibc 2.3.2 or higher.
    OpenMP profiling is supported on 64-bit (Leopard) and on 32-bit (Tiger and Leopard) Mac OS X operating systems.
    OpenMP profiling is supported on 64-bit (XP Professional x64 Edition, Vista or Windows Server* 2003 x64 Edition) and on 32-bit (Windows 2000, XP or Vista) Microsoft* Windows* operating systems. In addition, the PGI CDK for Windows Compute Cluster Server 2003 also supports profiling of Microsoft MPI (MSMPI) applications.
  • Memory: Minimum 32 MB per Cluster Node. 128 MB recommended for Front-end Node.
  • Hard Disk: 400 MB on front-end node; 50 MB on each Cluster Node.
  • Display: Requires a minimum of 800 x 600 resolution monitor. Recommend 16-bit color mode with KDE or the Gnome desktop and in conjunction with displays set to local hosts.
  • Peripherals: Mouse or compatible pointing device for use of optional graphical user interfaces. CD-ROM disk drive for installation.