PGI Tutorial Summary

Michael Wolfe from PGI will be conducting a one day tutorial on "GPU Programming with CUDA Fortran, CUDA C and the PGI Accelerator Programming Model" on Monday, 15 November 2010, in New Orleans. The tutorial includes:

  • Full day training course (syllabus below)
  • The option to purchase a single-seat node-locked license to any PGI Accelerator product* for $250. Save up to 83% off list price!
  • Printed and/or electronic copy of the presentation materials
  • Hosted lunch

To participate, please complete the registration form.

* All PGI Accelerator Fortran products include CUDA Fortran.


Schedule

Date: Monday, 15 November 2010

08:00 - 08:30 - Registration and Coffee
08:30 - 09:00 - Introduction to GPU programming
09:00 - 10:00 - Programming with CUDA C and Fortran
10:00 - 10:15 - Break
10:15 - 12:00 - CUDA Programming (continued)
12:00 - 13:00 - Lunch
13:00 - 14:30 - PGI Accelerator Directives
14:30 - 14:45 - Break
14:45 - 16:15 - PGI Accelerator Directives (continued)
16:15 - 16:30 - Wrap Up

Schedule is subject to minor changes. Tutorial will be held at the Hotel New Orleans Convention Center, one block from Ernest N. Morial Convention Center, site of SC10.

Tutorial Syllabus

Part I. Introduction

  1. CPU Architecture vs. GPU Architecture
    • CPU Architecture Basics
    • Multicore and multiprocessor basics
    • GPU Architecture Basics
    • How is the GPU connected to the host?
  2. Why is parallel programming for GPUs different than for multicore?
    • What is a GPU thread and how does it execute?
    • How can I identify my GPU?

Part II. CUDA, C and Fortran

  1. Low-level GPU Programming CUDA
    • How does data get to the GPU?
    • How does a program run on the GPU?
    • What kinds of parallelism is appropriate for a GPU?
    • The CUDA programming model
      • Host code to control GPU, allocate memory, launch kernels
      • Kernel code to execute on GPU
        • Scalar routine executed on one thread
        • Launched in parallel on a grid of thread blocks
  2. The Host Program
    • Declaring and allocating device memory data
    • Moving data to and from the device
    • Launching kernels
  3. Writing Kernels
    • What is allowed in a kernel vs. what is not allowed
    • Grids, Blocks, Threads, Warps
  4. Building and Running CUDA Programs
    • Compiler options
    • Running your program
    • The CUDA Runtime API
    • CUDA Fortran vs. CUDA C
  5. Performance Tuning, Tips and Tricks
    • Measuring performance, using cudaprof
      • Occupancy, memory coalescing
    • Optimizing your kernels
      • Optimize communication between host and GPU
      • Optimize device memory accesses, shared memory usage
      • Optimize the kernel code
    • Debugging using emulation

Part III. PGI Accelerator Model

  1. High-level GPU Programming using the PGI Accelerator Model
    • What role does a high-level model play?
    • Basic concepts and directive syntax
    • Accelerator compute and data regions
    • Appropriate algorithms for a GPU
  2. Building and Running Accelerator Programs
    • Command line options
    • Enabling compiler feedback
  3. Accelerator Directive Details
    • Compute regions
      • Clauses on the compute region directive
      • What can appear in a compute region
      • Obstacles to successful acceleration
    • Loop directive
      • Clauses on the loop directive
      • Loop schedules
    • Data regions
      • Clauses on the data region directive
  4. Interpreting compiler feedback
    • Using pgprof source browser
    • Hindrances to parallelism
    • Data movement feedback
    • Reading kernel schedules
  5. Performance Tuning, Tips and Tricks
    • Choosing accelerator device
    • PGI Unified Binary for multiple host or multiple accelerators
    • Performance profiling information
    • Optimizing initialization time
  6. Performance Tuning (50 minutes)
    • Appropriate algorithm
    • Optimizing data movement between host and GPU
    • Optimizing kernel performance
    • Tuning the kernel schedule
    • Choosing accelerator device
    • PGI Unified Binary for multiple host or multiple accelerators
    • Performance profiling information
    • Optimizing initialization time

Part IV. Wrap-up, Questions

  1. Accelerators in HPC
    • Past, present, future role of accelerators in HPC
    • Past, present, future of programming models for accelerators
    • How to reach an exaflop

About the Presenter

Michael Wolfe has been a compiler engineer at The Portland Group since joining in 1996, where his responsibilities and interests include deep compiler analysis and optimizations ranging from improving power consumption for embedded microcores to improving the efficiency of Fortran on parallel clusters. He was an associate professor at the Oregon Graduate Institute from 1988 until 1996, and was a cofounder and lead compiler engineer at Kuck and Associates, Inc., prior to that. He was granted a PhD in Computer Science from the University of Illinois, and has published the textbook, High Performance Compilers for Parallel Computing, and a monograph, Optimizing Supercompilers for Supercomputers, and many technical papers. Dr. Wolfe is also a Fellow with STMicroelectronics.

Location

Hotel New Orleans Convention Center
Diamond Room
881 Convention Center Boulevard
New Orleans, LA 70130-1754
(504) 524-1881

Terms and Conditions

  • Price is $500US per attendee.
  • Please specify your interest in purchasing a discounted license for the PGI Accelerator compiler of your choice when registering..
  • Attendance is limited to the first 50 registrants, first come first served.
  • 100% registration fee credit for cancellations received before 6 November 20109. 50% credit for cancellations received between 6–13 November. No credit for cancellations received after 13 November.
Click me