GPU Tutorial Summary

Michael Wolfe from PGI will conduct a half-day tutorial on "Introduction to Programming GPUs" on the morning of Tuesday, 31 May 2011, in Tucson, Arizona, in conjunction with the 25th International Conference on Supercomputing (ICS 2011). Registration and cost information is available from the ICS 2011 web site. You may also be interested in the afternoon tutorial on "Performance Optimization on GPUs".

This tutorial studies GPU programming as a parallel programming skill. The tutorial begins with an introduction to GPU architectures and why programming GPUs differs from programming multi-core systems. CUDA is NVIDIA's architecture for executing highly parallel applications on GPUs; C for CUDA and PGI CUDA Fortran are extensions to C and Fortran for programming to the CUDA model. We dive into CUDA programming, both in C and Fortran, introduce basic concepts, and write and build simple examples. The tutorial then shifts to a discussion of high-level directive-based programming models for GPUs. We introduce the PGI Accelerator programming model and the OpenMP accelerator directives currently being designed. Once again, we start with basic concepts and show how to write and build simple examples. We close with a comparison of the methods and advantages of each approach.

The target audience is C and Fortran programmers who have tuned programs for vector and/or parallel computers in the past, and who want to learn about and explore using GPUs for high performance computing. The tutorial will be interactive lectures with live demonstrations using a programmable GPU.

Tentative Schedule

Date: Tuesday, 31 May 2011

08:30-09:00 - CPU Architecture vs. GPU Architecture
09:00-09:30 - Low-level GPU Programming, CUDA
09:30-09:45 - The Host Program
09:45-10:00 - Writing CUDA Kernels
10:00-10:30 - Break
10:30-10:50 - Building and Running CUDA programs
10:50-11:15 - High level GPU Programming using the PGI Accelerator Model
11:15-11:40 - Building and Running Accelerator programs
11:40-12:00 - Interpreting Compiler Feedback

Schedule is subject to minor changes.

Tutorial Syllabus

Part I. Introduction (0:30)

  1. CPU Architecture vs. GPU Architecutre
    • CPU Architecture basics
    • Multicore and multiprocessor basics
    • GPU Architecture basics
    • How is the GPU connected to the GPU
  2. Why is parallel programming for the GPU different than for multicore?
    • What is a GPU thread? How does it execute?
    • Identifying my GPU

Part II. CUDA, C and Fortran (1:20)

  1. Low-level GPU Programming using CUDA
    • How does data get to the GPU?
    • How does a program run on the GPU?
    • What kinds of parallelism is appropriate for the GPU?
    • The role of the host program
    • The GPU kernel; writing and launching a GPU kernel.
  2. The Host Program
    • Declaring and allocating device memory data
    • Moving data to and from the device
    • Launching kernels
  3. GPU Kernels
    • What is allowed in a kernel
    • Grids, Blocks, Threads, Warps
  4. Building and Running CUDA Programs
    • Compiler options
    • Running your program
    • The CUDA Runtime API
    • CUDA Fortran vs. CUDA C

Part III. The PGI Accelerator Model (1:10)

  1. High-level GPU Programming using the PGI Accelerator Model
    • What role does a high-level model plan?
    • Basic concepts and directive syntax
    • Accelerator compute and data regions
    • Appropriate algorithms for a GPU
  2. Building and Running Accelerator Programs
    • Compiler options
    • Enabling and interpreting compiler feedback
  3. Directive details
    • Region directive
    • Loop directive

About the Presenter

Michael Wolfe has been a compiler engineer at The Portland Group since joining in 1996, where his responsibilities and interests include deep compiler analysis and optimizations ranging from improving power consumption for embedded microcores to improving the efficiency of Fortran on parallel clusters. He was an associate professor at the Oregon Graduate Institute from 1988 until 1996, and was a cofounder and lead compiler engineer at Kuck and Associates, Inc., prior to that. He was granted a PhD in Computer Science from the University of Illinois, and has published the textbook, High Performance Compilers for Parallel Computing, and a monograph, Optimizing Supercompilers for Supercomputers, and many technical papers. Dr. Wolfe is also a Fellow with STMicroelectronics.

Click me