PGPROF®is a powerful and simple-to-use interactive postmortem statistical analyzer for parallel programs written with OpenMP or OpenACC directives or accelerated using CUDA. Use PGPROF to visualize and diagnose the performance of the components of your program. PGPROF associates execution time with the source code and instructions of your program, allowing you to see where and how execution time is spent. Through resource utilization data and compiler feedback information, PGPROF also provides features for helping you to understand why parts of your program have high execution times.
Use PGPROF to analyze programs on multicore SMP servers, distributed-memory clusters and hybrid clusters where each node contains both multicore x64 processors and accelerators. PGPROF can profile multi-threaded OpenMP programs or GPU acclerated programs or a combination of both. PGPROF allows profiling at the function or source code line for PGI and non-PGI compiled Fortran, C and C++ programs.
Using the Common Compiler Feedback Format (CCFF), PGI compilers save information about how your program was optimized, or why a particular optimization was not made. PGPROF can extract this information and associate it with source code and other performance data, allowing you to view all of this information simultaneously.
PGPROF provides the information necessary for determining which functions and lines in an application are consuming the most execution time. Combined with the feedback features of the PGI compilers, PGPROF will enable you to maximize vectorization and performance on a single x64 processor core. On GPUs, PGPROF reports performance critical information including initialization, data transfer and kernel execution times.
Performance data from your application is collected via a very low overhead sample-based method that does not require you to recompile your application. Additionally, to gather information about GPU accelerated applications it queries the OpenACC tools interface to provide details about what is happening at each OpenACC kernel as it is executed. The CUDA runtime is also profiled to provide information about your program's execution on the GPU.
PGPROF offers a combined view showing both GPU and x64 host performance for an application. On the CPU you can examine performance statistics on a per-thread basis. On the GPU performance statistics are presented on a per-kernel basis.
A partial list of technical features supported includes the following:
PGPROF OpenMP & OpenACC Profiler
(included with PGI Workstation class and PGI Server class products and the PGI CDK)