PGI CDK Cluster Development Kit

Parallel Fortran, C and C++ Compilers & Tools for Programming HPC Clusters

The PGI CDK® Cluster Development Kit® enables use of networked clusters of AMD or Intel x64 processor-based workstations and servers to tackle the largest scientific computing applications. The PGI CDK includes pre-configured versions of MPI for Ethernet and InfiniBand to enable development, debugging and tuning of high-performance MPI or hybrid MPI/OpenMP applications written in Fortran, C or C++.

Parallel Fortran, C and C++ Compilers

PGI CDK® Cluster Development Kit®: Compilers and tools to develop,  debug and tune MPI and OpenMP cluster applications PGI compilers offer world-class performance and features including auto-parallelization and OpenMP 3.1 directive-based parallelization for multi-cors, OpenACC 2.0 directive-based parallel programming for accelerators and support for the PGI Unified Binary™ technology. The PGI Unified Binary streamlines cross-platform support by combining into a single executable file code optimized for multiple x64 processors. This assures your applications will run correctly and with optimal performance regardless of the type of x64 processor on which they are deployed or even whether you system includes an accelerator. PGI's state-of-the-art compiler optimization technologies include SSE vectorization, auto-parallelization, inter-procedural analysis and optimization, memory hierarchy optimizations, function inlining (including library functions), profile feedback optimization, CPU-specific microarchitecture optimizations and more.

About PGI Accelerator CDK

PGI offers separate products for x64+accelerator or x64-only platforms. "PGI Accelerator" products—the x64+accelerator platform products—include support for the directive-based OpenACC programming model, CUDA Fortran and CUDA-x86. Supported accelerators include CUDA-enabled NVIDIA GPUs and select AMD Radeon GPUs.

The PGDBG OpenMP/MPI Debugger

Debugging a cluster MPI application can be extremely challenging. The PGDBG® debugger provides a comprehensive set of graphical user interface (GUI) elements to assist you in this process. PGDBG provides the ability to separately debug and control OpenMP threads and MPI processes on your Linux cluster. Step, Break, Run or Halt OpenMP threads or MPI processes individually, as a group, or in user-defined process/thread subsets. PGDBG can even display the state of MPI message queues, enabling you to quickly isolate and resolve message-passing deadlock bugs. Using a single integrated multi-process debugging window, PGDBG provides precise control and feedback on the state of every MPI process and OpenMP thread simultaneously, with fully integrated capabilities for debugging hybrid parallel programs that use MPI message-passing between nodes and OpenMP shared-memory parallelism within a multicore processor-based cluster node.

The main PGDBG window displays Fortran, C or C++ program source code, optionally interleaved with the corresponding x64 assembly code. In addition to the main source code window, PGDBG provides supplementary program information in a number of tabbed panels including call stack, registers, local variables, memory, a command line, events, graphical process and thread grid, status messages, MPI messages and group information. PGDBG is interoperable with the GNU gcc/g++ compilers on Linux.

The PGPROF OpenMP/MPI Profiler

PGPROF® is a powerful and easy-to-use interactive postmortem statistical analyzer for MPI parallel and OpenMP thread-parallel programs running on Linux clusters. Use PGPROF to visualize and diagnose the performance of the components of your program. PGPROF associates execution time with the source code and instructions of your program allowing you to see where and how execution time is spent. Through resource utilization data and compiler feedback information, PGPROF also provides features for helping you to understand why certain parts of your program have high execution times.

Use PGPROF to analyze programs on multicore SMP servers, distributed-memory clusters and hybrid clusters where each node contains multicore x64 processors. Use the PGPROF profiler to profile parallel programs, including multiprocess MPI programs, multi-threaded OpenMP programs, or a combination of both. PGPROF allows profiling at the function, source code line, and assembly instruction level for PGI-compiled Fortran, C and C++ programs. PGPROF provides views of the performance data for analysis of MPI communication, multiprocess and multi-thread load balancing, and scalability.

Using the Common Compiler Feedback Format (CCFF), PGI compilers save information about how your program was optimized, or why a particular optimization was not made. PGPROF can extract this information and associate it with source code and other performance data, enabling you to view all of this information simultaneously. PGPROF also supports a feedback-only mode, which allows you to browse compiler feedback associated with a CCFF-enabled binary executable in the absence of a performance profile.

Each performance profile depends on the resources of the system where it is run. PGPROF provides a summary of the processor(s) and operating system(s) used by the application during any given performance experiment

PGPROF provides the information necessary for determining which functions and lines in an application are consuming the most execution time. Combined with the feedback features of the PGI compilers, PGPROF enables maximizing vectorization and performance on a single x64 processor core. PGPROF exposes performance bottlenecks in a cluster application by presenting the number of calls, aggregate message size and execution time of individual MPI function calls on a line by line basis.

Use PGPROF to merge trace files from multiple runs on different numbers of nodes to perform scalability analysis on your MPI or OpenMP application at the application, function or line level. Scalability analysis plainly displays which parts of your application are barriers to scalable performance, and where parallel tuning efforts should be focused. PGPROF displays information in easy-to-use formats such as bar-charts, percentages, counts or seconds.

PGI CDK Cluster Development Kit Key Features

  • Floating multi-user seats for the PGI parallel PGFORTRAN™, PGCC® and PGC++® compilers. World-class single core and multicore processor performance
  • Auto-parallelization for the latest AMD and Intel multicore processors
  • Full native support for OpenMP 3.1 directive- and pragma-based SMP or multicore parallelization in PGFORTRAN, PGCC and PGC++
  • OpenACC 2.0 directive-based parallelization for NVIDIA and AMD GPU accelerators.
  • Graphical parallel PGDBG debugger and PGPROF performance profiler for auto-parallel, thread-parallel, OpenMP and MPI programs
  • Pre-configured MPI message-passing libraries (MPICH, Open MPI, MVAPICH)
  • Optimized BLAS, LAPACK and ScaLAPACK math libraries
  • Comprehensive support for all major Linux distributions
  • Installation utilities to simplify the setup and management of your Linux cluster

MPI Support

The OpenMP and MPI parallel PGDBG debugger and PGPROF performance profiler included with the PGI CDK support MPICH, MPICH2, MPICH3, SGI-MPI and Open MPI over Ethernet and MVAPICH and MVAPICH2 over InfiniBand clusters. MPICH was developed at the Argonne National Laboratory. MPICH is an open source implementation of the Message-Passing Interface (MPI) standard. MPICH is a full implementation of MPI, so your existing MPI applications will port easily to your Linux cluster using the PGI CDK.

MVAPICH, the "MPI over InfiniBand, iWARP and RDMA-enabled Interconnects" project is led by Network-Based Computing Laboratory, Department of Computer Science and Engineering at the Ohio State University.

Request a 30 day trial of the PGI CDK by completing the PGI CDK Evaluation Request Form.

Technical Features

A partial list of technical features supported includes the following:

  • PGFORTRAN™ native OpenMP, OpenACC and auto-parallel Fortran 2003 compiler with CUDA extensions
  • PGCC® OpenMP, OpenACC and auto-parallel ANSI and K&R C11 compiler
  • PGC++® OpenMP, OpenACC and auto-parallel GNU 4.8 g++ compatible C++11 compiler with CUDA-x86 extensions
  • PGDBG® OpenMP and MPI parallel graphical debugger
  • PGPROF® OpenMP and MPI parallel graphical performance profiler
  • Full 64-bit support on multi-core AMD64 and Intel 64
  • Full support for OpenMP 3.1 on up to 256 cores
  • Comprehensive OpenACC 2.0 support.
  • PGI Unified Binary™ technology combines into a single executable or object file code optimized for multiple AMD64 processors, Intel 64 processors, NVIDIA GPUs or AMD GPUs.
  • Comprehensive set of compiler optimizations including one pass interprocedural analysis (IPA), interprocedural optimization of libraries, profile feedback optimization, dependence analysis and global optimization, function inlining including library functions, vectorization, invariant conditional removal, loop interchange, loop splitting, loop unrolling, loop fusion, cache tiling and more.
  • Support for 64-bit integers (-r8/-i8 compilation flags)
  • Memory hierarchy and memory allocation optimizations including huge pages support
  • Auto-parallelization of loops specifically optimized for multi-core processors
  • Concurrent subroutine call support
  • Highly tuned Intel MMX and SSE intrinsics library routines (C/C++ only)
  • Tuning for non-uniform memory access (NUMA) architectures
  • Process/CPU affinity support in SMP/OpenMP applications
  • Support for creating shared objects
  • Integrated cpp pre-processing
  • Cray/DEC/IBM extensions (including Cray POINTERs & DEC STRUCTURES/UNIONS); support for SGI-compatible DOACROSS in Fortran
  • Full support for Common Compiler Feedback Format compiler optimization listings
  • User modules support simplifies switching between multiple compiler environments/versions
  • C/C++ plug-in for Eclipse
  • Bundled precompiled libraries including LAPACK version 3.4.2, ScaLAPACK version 2.0.2, MPICH version 3.1.3. Precompiled Open MPI 1.8.4 and MVAPICH 2.0 libraries are available for download separately.
  • Includes optimized ACML (LAPACK/BLAS/FFT) math library supported on Linux
  • Supports multi-threaded execution with Intel Math Kernel Libraries (MKL) 10.1 and later
  • Optional PGI compiled IMSL Fortran numerical library available
  • Includes separate 64-bit x64 and 32-bit x86 development environments and compilers
  • Interoperable with TotalView and Allinea DDT.
  • Interoperable with gcc, g77, g++ and gdb
  • Unconditional 30 day money back guarantee

System Requirements

  • Front-end Node: 64-bit x64 or 32-bit x86 processor-based workstation or server with one or more AMD or Intel microprocessors.
  • Cluster Nodes: 64-bit x64 or 32-bit x86 processor-based workstation or server with one or more AMD or Intel microprocessors.
    Accelerator (optional): NVIDIA CUDA-enabled GPU with compute capability 1.0 or later. AMD Radeon HD 7700, 7800 and 7900 series, R7 and R9 series GPUs (Cape Verde, Tahiti or Spectre).
    Note: Heterogeneous systems that include both 32-bit and 64-bit processor-based workstations or servers are not supported.
  • Network: Standard TCP/IP network such as Ethernet, Fast Ethernet or Gigabit Ethernet; high-performance InfiniBand network. Preferred configuration is a dedicated private network interconnecting the cluster nodes, with the designated front-end node also networked to a general purpose network.
  • Operating System: Linux—all distributions with kernel revision 2.6 or newer and glibc 2.3.4 or newer.
  • Memory: Minimum 1 GB per cluster node. 2 GB recommended for front-end node.
  • Hard Disk: 1.5 GB on front-end node; 150 MB on each cluster node.
  • Other: Adobe Acrobat Reader for viewing documentation.
Click me