In combination with the Linux or Windows Compute Cluster Server 2003 operating systems, the PGI CDK® Cluster Development Kit® compilers and development tools enable use of networked clusters of AMD or Intel x64 processor-based workstations and servers to tackle serious scientific computing applications. For Linux, the PGI CDK includes pre-configured versions of MPI for Ethernet or InfiniBand, and a pre-configured batch queueing system. On Windows CCS, the PGI CDK integrates with MSMPI and the job scheduler to enable development, debugging and tuning of high-performance MPI or hybrid MPI/OpenMP applications written in Fortran, C or C++.
PGI compilers offer world-class performance and features including auto-parallelization for multi-core, OpenMP directive-based parallelization, and support for the PGI Unified Binary technology. The PGI Unified Binary streamlines cross-platform support by combining into a single executable file code optimized for multiple x64 processors. This gives you the assurance that your applications will run correctly and with optimal performance regardless of the type of x64 processor on which they are deployed. PGI's state-of-the-art compiler optimization technologies include SSE vectorization, auto-parallelization, inter-procedural analysis and optimization, memory hierarchy optimizations, function in-lining (including library functions), profile-feedback optimization, CPU-specific micro-architecture optimizations and more. PGI is the ideal solution for migrating compute-intensive legacy applications from RISC/UNIX servers and workstations to 64-bit Linux or Windows CCS clusters.
Debugging a cluster MPI application can be extremely challenging. The PGDBG® debugger provides a comprehensive set of graphical user interface (GUI) elements to assist you in this process. PGDBG provides the ability to separately debug and control OpenMP threads and MPI processes on your Linux or Windows CCS cluster. Step, Next, Break, Halt, Wait or Continue OpenMP threads or MPI processes individually, as a group, or in user-defined process/thread subsets. PGDBG can even display the state of MPI message queues, enabling you to quickly isolate and resolve message-passing deadlock bugs.
Using a single integrated multi-process debugging window, PGDBG provides precise control and feedback on the state of every MPI process and OpenMP thread simultaneously, with fully integrated capabilities for debugging hybrid parallel programs that use MPI message-passing between nodes and OpenMP shared-memory parallelism within a multi-core processor-based cluster node.
The main PGDBG window displays Fortran, C or C++ program source code, optionally interleaved with the corresponding x64 assembly code. Sub-windows enable watch points, register state dumps, and execution of a sequence of user-defined commands at every break point. The main window includes one-touch buttons for the most common debugging commands. A simple and intuitive process/thread grid makes it easy to change the context of the source window and all sub-windows from one process to another with a single mouse click, greatly simplifying control over individual or collective OpenMP threads and MPI processes. PGDBG is interoperable with the Microsoft Visual C++ compiler on Windows CCS, and with the GNU gcc/g++ compilers on Linux.
View the PGDBG demo (approximately seven minute Flash movie).
PGPROF® is an interactive, powerful and easy-to-use postmortem statistical analyzer for MPI parallel and OpenMP thread-parallel programs running on Linux or Windows CCS clusters. You can use PGPROF to analyze programs on multi-core SMP Servers, distributed-memory clusters and hybrid clusters where each node contains multi-core x64 processors. PGPROF allows profiling at the function, source code line, and assembly instruction level for PGI-compiled Fortran, C and C++ programs.
PGPROF provides the information you need to determine which functions and lines in an application are consuming the most execution time. Combined with the feedback features of the PGI compilers, PGPROF will enable you to maximize vectorization and performance on a single x64 processor core. PGPROF exposes performance bottlenecks in a cluster application by presenting the number of calls, aggregate message size and execution time of individual MPI function calls on a line by line basis.
Using PGPROF, you can merge trace files from multiple runs on different numbers of nodes to perform scalability analysis on your MPI or OpenMP application at the application, function or line level. Scalability analysis allows you to quickly see which parts of your application are barriers to scalable performance, and where your parallel tuning efforts should be focused. PGPROF displays information in easy-to-use formats such as bar-charts, percentages, counts or seconds and displays profiles using graphical histograms.
On Linux, the OpenMP and MPI parallel PGDBG debugger and PGPROF performance profiler included with the PGI CDK support MPICH and MPICH2 over ethernet and MVAPICH over InfiniBand clusters. MPICH (including MPICH2) was developed at the Argonne National Laboratory. MPICH is an open source implementation of the Message-Passing Interface (MPI) standard. MPICH is a full implementation of MPI, so your existing MPI applications will port easily to your Linux cluster using the PGI CDK.
MVAPICH, the "MPI over InfiniBand, iWARP and RDMA-enabled Interconnects" project is lead by Network-Based Computing Laboratory, Department of Computer Science and Engineering at Ohio State University.
On Windows, the OpenMP and MPI Parallel PGDBG debugger and PGPROF performance profiler included with the PGI CDK support MSMPI.
Request a 15 day trial of the PGI CDK by completing the PGI CDK Evaluation Request Form.
*About the PGI Roll - The PGI Roll is distributed through Clustercorp. The software download is free but registration is required. The PGI Roll contains software only. A valid PGI license is required to use the software. A valid PGI CDK license is required to enable remote MPI debugging and profiling.
A partial list of technical features supported includes the following:
Note: Heterogeneous systems that include both 32-bit and 64-bit processor-based workstations or servers are not supported.