Earth Modeling |
GEOS-5 (Goddard Earth Observing System Model, Version 5)Jarrett Cohen, NASA High-End Computing Program, Goddard Space Flight Center "The [porting effort] situation has improved with more robust versions of NVIDIA's CUDA C software development toolkits and the Portland Group, Inc.'s (PGI) CUDA Fortran Compiler; the latter is key for GEOS-5, which is primarily written in Fortran. Typical component porting times are now 2 weeks to 1 month." Article (including results) |
Environmental Science |
Phytoplankton Prediction Over TimeDr. Kerry Black, SurfPool and Harsh Kumar, ITT Kharagpur "The PGI compiler is now showing us just how powerful it is. On the software we are writing, it's at least 60 times faster on the NVIDIA card. we are very pleased and excited about the future uses. It's like owning a personal supercomputer." Results |
Evaluation |
Accelerator directives: A user's perspectiveAlistair Hart, Harvey Richardson (Cray ERI), Alan Gray (EPCC ETC), Karthee Sivalingham (EPCC) This paper reports on the authors experience optimizing GPU performance of three benchmarks using PGI Accelerator directives. In-depth work on the HPsrc benchmark yielded performance on par with CUDA. Results, Paper, Presentation |
CFD |
Unsteady Turbulent Simulations on a Cluster of Graphics ProcessorsEverett H. Phillips, Roger L. Davis, and John D. Owens—UC Davis This paper describes the GPU accelerated MBFLO2 multi-block turbulent flow solver implemented completely in double precision using CUDA and the latest generation of GPU processors. On a cluster of eight Tesla Fermi GPUs and Intel Nehalem quad-core CPUs, a 9x speedup was achieved over the parallel CPU solver and 70x speedup over the serial solver. Results, Paper |
Library |
CULA—CUDA enabled LAPACK LibrariesJohn Humphrey and Kyle Spagnoli—EM Photonics The CULA libraries are CUDA enabled LAPACK libraries which are callable directly from PGI's CUDA Fortran. Free basic library version is available (limited functions, single precision). Results, Article, Download |
Benchmark |
A CUDA Fortran Implementation of BWAVESGregory Ruetsch and Massimiliano Fatica—NVIDIA Corp. Port of the SPEC benchmark BWAVES to the CUDA architecture using CUDA Fortran. Covers identification of CPU bottlenecks, distribution of code execution, hiding data transfers and writing efficient CUDA kernels. Accuracy and performance results are presented for various platforms. Results, Paper, Request Source Code |
Numerical Analysis |
Parallel Random Number Generation Using OpenMP, OpenCL and PGI Accelerator DirectivesFederico Dal Castello, Advanced System Technology, STMicroelectronics, Italy Optimizing code using four common parallel programming technologies including OpenMP, PGI Accelerator directives, CUDA and OpenCL. Results, Article, Source Code |
Weather |
GPU Acceleration of the Long-Wave Rapid Radiative Transfer Model in WRF Using CUDA FortranGregory Ruetsch, Everett Phillips, Massimiliano Fatica—NVIDIA Corp. Process and methodology used to port RRTM weather kernel to GPUs using CUDA C and CUDA Fortran. Results, Paper, Presentation, Source Code |
Numerical Analysis |
Tuning a Monte Carlo Algorithm on GPUsMathew Colgrove—The Portland Group Step by step process of tuning a simple Monte Carlo Integration algorithm for optimum performance on a GPU. Highlights several key concepts of CUDA Fortran including reductions, contiguous data access and mixing CUDA C and CUDA Fortran. Results, Article |
Benchmark |
Building the Cactus benchADM Benchmark with PGI Accelerator FortranMathew Colgrove—The Portland Group A step by step guide to using the PGI Accelerator Fortran compiler to build a GPU enabled version of BenchADM, the computational kernel representative of many applications in numerical relativity. Results, Source Code, Paper |
Benchmark |
Building the STREAM benchmark with CUDA FortranBrent Leback—The Portland Group
This is a port of the STREAM benchmark to CUDA Fortran. It is useful for comparing the bandwidth of an NVIDIA GPU to a multi-core, multi-socket x64 server. |
Benchmark |
Building the Himeno Benchmark with PGI Accelerator FortranDoug Miles—The Portland Group
Developed by Dr. Ryutaro Himeno at the Riken Advanced Center for Computing and Communication in Tokyo, the Himeno benchmark is based on a 3D Poisson solver. |
Demonstration |
Building and Testing a Smooth Routine with PGI Accelerator CompilersOle W. Saastad—USIT, University of Oslo A simple test routine "smooth" shows the potential for GPU acceleration. The full research looks at various methods to program GPU's including calling optimized GPU libraries from C and Fortran, and using the PGI Accelerator compilers to generate Fortran code for NVIDIA GPUs. Results, Paper |
|
Tool |
Tau Tuning & Analysis ToolPerformance Research Lab—University of Oregon TAU interfaces with the PGI runtime library to extract performance information about kernels executing on GPUs. TAU tracks interactions with GPUs as seen from the host. Performance data includes the name of the routine, file, line number as well as block and grid sizes and individual variable names. Website, Download Package |
Weather |
Porting the WRF WSM52d Kernel to GPUs Using PGI Accelerator FortranMichael Wolfe, Craig Toepfer—The Portland Group
Optimizing a key module in the Weather Research and Forecasting (WRF) application to run on GPUs. Uses a combination of structural modifications and compiler feedback. |