Technical News from The Portland Group


In This Issue | AUG 2012

OpenACC Kernels and Parallel Constructs

Thread- and Instruction-level Parallelism in CUDA Fortran

Creating an OpenCL-enabled Android App with PGCL

Introduction to Texture Memory Support in CUDA Fortran

Upcoming Events

Michael Wolfe will be presenting part 3 of his webinar series on "Using OpenACC with the PGI Accelerator Compilers" Thursday, 20 September at 9:00 PDT. Registration is open now.

PGI is exhibiting in booth #1315 at SC12 in Salt Lake City, Utah 12-15 November.


PGI Accelerator with OpenACC
Getting Started Guide

PGI Accelerator

CUDA Fortran

PGI CUDA C/C++ for x86

PGI User Forums

Recent News

PGI Updates Its OpenCL Compiler for Multi-core ARM

PGI Ships PGI Accelerator Compilers with OpenACC

Next Issue

Advanced OpenACC Features

Multiple GPU Support with OpenACC

Comparing CUDA and OpenMP Performance on x86

PGI Accelerator Programming Model v2.0

The Portland Group, Inc.
Suite 320
Two Centerpointe Drive
Lake Oswego, OR 97035

Michael Wolfe

OpenACC Kernels and Parallel Constructs

Michael Wolfe's Programming Guide

One key difference between the PGI Accelerator programming model and OpenACC API is that the latter has two compute constructs, the kernels construct and the parallel construct. This article describes the differences between the two and use cases for each.

The OpenACC kernels and parallel constructs each try to solve the same problem, identifying loop parallelism and mapping it to the machine parallelism. The kernels construct is more implicit, giving the compiler more freedom to find and map parallelism according to the requirements of the target accelerator. The parallel construct is more explicit, and requires more analysis by the programmer to determine when it is legal and appropriate. | Continue to the article…

Thread- and Instruction-level Parallelism in CUDA Fortran

In this article, Greg Ruetsch from NVIDIA discusses two basic techniques for exposing enough parallelism in CUDA kernels to keep the hardware busy and achieve good performance. Thread-level parallelism is straight forward but there are situations when kernel resource consumption can restrict the number of concurrent threads on a multiprocessor. In those cases, instruction-level parallelism can provide some alernative. Continue to the article

Creating an OpenCL-enabled Android App with PGCL

In this article, we show step-by-step how to incorporate an Activity into an Android App that enables compute-intensive portions of the App to be accelerated on multiple ARM cores with NEON using the PGI OpenCL compiler. Starting with an OpenCL implementation of the SURF algorithm, we modified the compute kernels from their original GPU versions into a form more suitable for execution on a multi-core CPU. Continue to the article

Introduction to Texture Memory Support in CUDA Fortran

CUDA C experiments have suggested that using texture memory can in some cases boost overall application performance on a GPU by 10% or more. Support for texture memory from within CUDA Fortran has been a regularly requested feature. PGI version 12.8 is the first release to includes support for adding texture target declaration to a CUDA Fortran module.

This article takes a first look at using texture memory from within CUDA Fortran. The article covers programming concepts and demonstrates the performance potential using example programs. | Continue to the article