PGPROF Graphical Performance Profiler

Performance Profile Parallel OpenACC and OpenMP Applications

PGPROF®is a powerful and simple-to-use interactive postmortem statistical analyzer for parallel programs written with OpenMP or OpenACC directives or accelerated using CUDA. Use PGPROF to visualize and diagnose the performance of the components of your program. PGPROF associates execution time with the source code and instructions of your program, allowing you to see where and how execution time is spent. Through resource utilization data and compiler feedback information, PGPROF also provides features for helping you to understand why parts of your program have high execution times.

Use PGPROF to analyze programs on multicore SMP servers, distributed-memory clusters and hybrid clusters where each node contains both multicore x64 processors and accelerators. PGPROF can profile multi-threaded OpenMP programs or GPU acclerated programs or a combination of both. PGPROF allows profiling at the function or source code line for PGI and non-PGI compiled Fortran, C and C++ programs.

Using the Common Compiler Feedback Format (CCFF), PGI compilers save information about how your program was optimized, or why a particular optimization was not made. PGPROF can extract this information and associate it with source code and other performance data, allowing you to view all of this information simultaneously.


View full size image

PGPROF provides the information necessary for determining which functions and lines in an application are consuming the most execution time. Combined with the feedback features of the PGI compilers, PGPROF will enable you to maximize vectorization and performance on a single x64 processor core. On GPUs, PGPROF reports performance critical information including initialization, data transfer and kernel execution times.

Performance data from your application is collected via a very low overhead sample-based method that does not require you to recompile your application. Additionally, to gather information about GPU accelerated applications it queries the OpenACC tools interface to provide details about what is happening at each OpenACC kernel as it is executed. The CUDA runtime is also profiled to provide information about your program's execution on the GPU.

PGPROF offers a combined view showing both GPU and x64 host performance for an application. On the CPU you can examine performance statistics on a per-thread basis. On the GPU performance statistics are presented on a per-kernel basis.

Technical Features

A partial list of technical features supported includes the following:

PGPROF OpenMP & OpenACC Profiler
(included with PGI Workstation class and PGI Server class products and the PGI CDK)

  • Profile serial, parallel and accelerated Fortran, C and C++ programs
  • Profile OpenACC kernels
  • For 64-bit multicore processor-based systems with or without accelerators
  • Supports thread-level OpenMP profiling
  • Supports profiling OpenACC and CUDA Fortran codes on NVIDIA CUDA-enabled GPU accelerators
  • Graphical and command-line user interfaces
  • Function level (routine) and source code line level profiling
  • Comprehensive built-in help facilities

System Requirements

  • Hardware: 64-bit x64 processor-based workstation or server with one or more single core or multicore AMD64 or Intel 64 microprocessors. GPU profiling supported on NVIDIA CUDA-enable GPUs only.
  • Operating System:
    • Linux: Fedora 22 Workstation, Ubuntu 14.04 and 16.04, Red Hat 6 and 7, CentOS 6 and 7, OpenSUSE 13.2, SLES 11 SP3/SP4 and 12
    • OS X: Mac OS X 10.11 (El Capitan)
    • Windows: 7, 8.1, 10 and server versions 2008 R2, 2012 R2, 2016.
  • CUDA Driver: version 352 (CUDA 7.5) or newer for GPU profiling.
  • Java: Runtime environment version 7 or newer.
  • Memory: Minimum 1 GB, 2 GB recommended.
  • Other: Web browser for viewing documentation
Click me