The PGI suite of application parallel tools helps speed development by delivering a comprehensive and compatible set of powerful application debugging and performance profiling capabilities. Only PGI provides a complete solution for high performance application development. PGI tools are included with all PGI products.

The PGI graphical parallel debugger is capable of debugging both MPI-parallel and OpenMP thread-parallel Linux applications. It is included with the PGI Professional network floating licenses is capable of debugging up to 256 remote MPI process, each with up 64 OpenMP threads. The version included with other PGI products are capable of debugging up to 16 local MPI process, each with up to 64 OpenMP threads.

The PGI gaphical parallel performance profiler can profile multi-threaded OpenMP SMP programs or multi-stream OpenACC programs. It supports sample-based profiling and traces including routine, instruction and source-line level correlation.

The PGI Debugger for Linux x86, macOS and Windows capable of debugging serial and parallel programs including MPI process-parallel, OpenMP thread-parallel and hybrid MPI/OpenMP applications. It can debug programs on SMP workstations, servers, distributed-memory clusters and hybrid clusters where each node contains multiple 64-bit multicore processors.

The PGI Debugger is included with all PGI products. The version included with the PGI Profession network floating licenses is capable of debugging up to a maximum of 256 parallel MPI processes. The version included with all other PGI products supports debugging up to a maximum of 16 local MPI processes. The PGI Debugger can debug up to 64 OpenMP threads per MPI process.

Control Threads & Processes Separately
The PGI Debugger provides the ability to separately debug and control OpenMP threads, pthreads, and MPI processes running on a Linux cluster. Perform Step, Break, Run and Halt actions on threads or processes individually or collectively as a group.

Debugger GUI
View full size image

Powerful GUI
Debugging a cluster application can be extremely challenging. The PGI Debugger provides a comprehensive set of graphical user interface (GUI) elements to assist. Using a single window, you have precise control over each process in your cluster application and each thread on each multi-core cluster node. The Main window displays Fortran, C or C++ program source code. It includes one-touch buttons for actions such as Run, Break, Quit or Step. Additional controls on the Main window allow selecting and controlling individual or collective threads and processes, and the commands to access them.

Tabs in the Main window Source Panel allow you to display source code only, disassembly code showing how the currently executing high-level source code has been compiled into assembly language, or a mix where the assembly code is interleaved with the source code. Assembly language stepping and breakpoint indicators are enabled as well.

In addition to the main source code window, the PGI Debugger provides supplementary program information in a number of tabbed panels:

  • Call Stack tab displays the sequence of nested procedure calls
  • Register tab displays register values in a variety of formats
  • Locals tab displays contents of variables in the current scope
  • Memory tab displays a region of memory
  • Command tab provides a command line interface
  • Events tab shows current breakpoints, watchpoints, etc.
  • Process/Thread tab provides a graphical view of the application state
  • Status tab provides a text-based view of the application state
  • MPI Messages tab provides a dump of MPI message queues (MPI only)
  • Groups tab supports named grouping of OpenMP threads and MPI processes

Easy-to-use Features
The PGI Debugger handles Fortran, C and C++ programs. It is DBX-compatible and includes an extended command language for setting breakpoints and watchpoints, and for evaluating expressions. Use it to control execution and examine the state of a program either symbolically using source code or at the assembly level. The PGI Debugger allows switching contexts between threads in a parallel region, and to step or examine the state of any executing thread. It is also interoperable with the GNU compilers and other compilers that generate DWARF format debugging information. More information is available in the PGI Debugger Guide.

The PGI Debugger complements PGI's powerful OpenACC and OpenMP parallel graphical performance analysis profiler.

Technical Features

A partial list of technical features supported includes the following:

OpenMP & MPI Debugger

  • Debug Fortran, C and C++ programs
  • Debug assembly language
  • Debug parallel OpenMP and MPI programs
  • Supports MPICH3, MVAPICH2, Open MPI, SGI-MPI and MSMPI libraries
  • Supports pthreads on Linux and OS X, and native threads on Windows
  • PGI Community Edition and PGI Professional Edition Node-locked version supports debugging locally up to 16 MPI processes
  • PGI Professional Edition network floating version supports process-level MPI debugging, thread-level OpenMP debugging and hybrid combinations of MPI and OpenMP debugging for up to a maximum of 256 remote MPI processes.
  • Supports thread-level OpenMP debugging up to a maximum of 64 threads per process.
  • Examine message queues
  • Debug shared objects loaded at randomized addresses on Linux, dynamic libraries on OS X and DLL's on Windows
  • DBX compatible commands
  • One touch breakpoint setting
  • Step into, over or out of functions
  • Watchpoints
  • Traceback
  • One touch symbolic display
  • Read and process core files
  • Multiple format display of values or strings
  • Log files
  • Interoperable with GNU and other compilers that generate DWARF format debugging information.

Visual Debugging

  • Source code display in source, assembly or interleaved.
  • Execution control buttons.
  • Process/thread selectors.
  • Debug information tabs for call stack, registers, local variables, memory, MPI messages, process and thread state.

Process/Thread Control

  • Control OpenMP threads individually or collectively
  • Control processes individually or collectively
  • Auto-detect MPI processes
  • Auto-detect OpenMP threads
  • View OpenMP private data
  • Inspect the MPI message queues for each MPI process
  • Color-coded process/thread state
  • Synchronize threads/processes
  • For each process or thread:
    - Perform Step, Break, Run and Halt actions
    - Display of process/thread status and location
    - Switch contexts between threads/processes

System Requirements

  • Hardware: 64-bit x86 processor-based workstation or server with one or more single core or multicore microprocessors.
  • Operating System:
    • Linux: OpenMP and MPI debugging is supported on any Linux operating system with kernel revision 2.2.10 or newer. The PGI Debugger is fully interoperable with versions of Linux which use kernel revision 2.4 and glibc 2.3.2 or newer.
    • macOS: OpenMP and MPI debugging of 64-bit applications is supported on 64-bit OS X 10.9, 10.10 and 10.11 (currently not supported macOS 10.12 Sierra) .
    • Windows: OpenMP and MSMPI debugging is supported on 64-bit Microsoft Windows 7 and newer operating systems.
  • Hard Disk: 400 MB on front-end node; 50 MB on each cluster node.
  • Display: Requires a minimum of 800 x 600 resolution monitor.
  • Memory: Minimum 32 MB per cluster node. 128 MB recommended for front-end node. (Note: Memory requirements for the front-end node increase dramatically for MPI programs with randomized loading.)
  • Hard Disk: 400 MB on front-end node; 50 MB on each cluster node.
  • Peripherals: Mouse or compatible pointing device for use of optional graphical user interfaces.
  • Other: Adobe Acrobat Reader for viewing documentation.

The PGI Profiler (a/k/a PGPROF®)is a powerful and simple-to-use interactive postmortem statistical analyzer for parallel programs written with OpenMP or OpenACC directives or accelerated using CUDA. Use it to visualize and diagnose the performance of the components of your program. The PGI Profiler associates execution time with the source code and instructions of your program, allowing you to see where and how execution time is spent. Through resource utilization data and compiler feedback information, It also provides features for helping you to understand why parts of your program have high execution times.

Use the PGI Profiler to analyze programs on multicore SMP servers, distributed-memory clusters and hybrid clusters where each node contains both multicore x86 processors and accelerators. It can profile multi-threaded OpenMP programs or GPU acclerated programs or a combination of both. The PGI Profiler allows profiling at the function or source code line for PGI and non-PGI compiled Fortran, C and C++ programs.

Using the Common Compiler Feedback Format (CCFF), PGI compilers save information about how your program was optimized, or why a particular optimization was not made. PGPROF can extract this information and associate it with source code and other performance data, allowing you to view all of this information simultaneously.

Profiler Screen Capture
View full size image

The PGI Profiler provides the information necessary for determining which functions and lines in an application are consuming the most execution time. Combined with the feedback features of the PGI compilers, with it you can maximize vectorization and performance on a single processor core. On GPUs, it reports performance critical information including initialization, data transfer and kernel execution times.

Performance data from your application is collected via a very low overhead sample-based method that does not require you to recompile your application. Additionally, to gather information about GPU accelerated applications it queries the OpenACC tools interface to provide details about what is happening at each OpenACC kernel as it is executed. The CUDA runtime is also profiled to provide information about your program's execution on the GPU.

The PGI Profiler offers a combined view showing both GPU and CPU host performance for an application. On the CPU you can examine performance statistics on a per-thread basis. On the GPU performance statistics are presented on a per-kernel basis.

Technical Features

A partial list of technical features supported includes the following:

OpenMP & OpenACC Profiler

  • Profile serial, parallel and accelerated Fortran, C and C++ programs
  • Profile OpenACC kernels
  • For 64-bit multicore processor-based systems with or without accelerators
  • Supports thread-level OpenMP profiling
  • Supports profiling OpenACC and CUDA Fortran codes on NVIDIA CUDA-enabled GPU accelerators
  • Graphical and command-line user interfaces
  • Function level (routine) and source code line level profiling*
  • Comprehensive built-in help facilities

* Current limitations in the open source LLVM back-end prevent source line correlation when using the PGI Profiler on OpenPOWER.

System Requirements

  • Hardware: 64-bit OpenPOWER or x86 processor-based workstation or server with one or more single core or multicore microprocessors. GPU profiling supported on NVIDIA CUDA-enable GPUs only.
  • Operating System:
    • Linux-OpenPOWER: Ubuntu 14.04 and 14.10, Red Hat Enterprise Linux 7.3
    • Linux-x86: Fedora 22 Workstation, Ubuntu 14.04 and 16.04, Red Hat 6 and 7, CentOS 6 and 7, OpenSUSE 13.2, SLES 11 SP3/SP4 and 12
    • macOS: Mac OS X 10.11 (El Capitan)
    • Windows: 7, 8.1, 10 and server versions 2008 R2, 2012 R2, 2016.
  • CUDA Driver: version 352 (CUDA 7.5) or newer for GPU profiling.
  • Java: Runtime environment version 7 or newer.
  • Memory: Minimum 1 GB, 2 GB recommended.
  • Other: Web browser for viewing documentation
Click me