In This Issue | AUG 2012
OpenACC Kernels and Parallel Constructs
Thread- and Instruction-level Parallelism in CUDA Fortran
Creating an OpenCL-enabled Android App with PGCL
Introduction to Texture Memory Support in CUDA Fortran
Upcoming Events
Michael Wolfe will be presenting part 3 of his webinar series on "Using OpenACC with the PGI Accelerator Compilers" Thursday, 20 September at 9:00 PDT. Registration is open now.
PGI is exhibiting in booth #1315 at SC12 in Salt Lake City, Utah 12-15 November.
Resources
PGI Accelerator with OpenACC
Getting Started Guide
Recent News
PGI Updates Its OpenCL Compiler for Multi-core ARM
PGI Ships PGI Accelerator Compilers with OpenACC
Next Issue
Advanced OpenACC Features
Multiple GPU Support with OpenACC
Comparing CUDA and OpenMP Performance on x86
PGI Accelerator Programming Model v2.0
The Portland Group, Inc.
Suite 320
Two Centerpointe Drive
Lake Oswego, OR 97035

OpenACC Kernels and Parallel Constructs
Michael Wolfe's Programming Guide
One key difference between the PGI Accelerator programming model and OpenACC API is that the latter has two compute constructs, the kernels construct and the parallel construct. This article describes the differences between the two and use cases for each.
The OpenACC kernels and parallel constructs each try to solve the same problem, identifying loop parallelism and mapping it to the machine parallelism. The kernels construct is more implicit, giving the compiler more freedom to find and map parallelism according to the requirements of the target accelerator. The parallel construct is more explicit, and requires more analysis by the programmer to determine when it is legal and appropriate. | Continue to the article…
Thread- and Instruction-level Parallelism in CUDA Fortran
In this article, Greg Ruetsch from NVIDIA discusses two basic techniques for exposing enough parallelism in CUDA kernels to keep the hardware busy and achieve good performance. Thread-level parallelism is straight forward but there are situations when kernel resource consumption can restrict the number of concurrent threads on a multiprocessor. In those cases, instruction-level parallelism can provide some alernative. | Continue to the article
Creating an OpenCL-enabled Android App with PGCL
In this article, we show step-by-step how to incorporate an Activity into an Android App that enables compute-intensive portions of the App to be accelerated on multiple ARM cores with NEON using the PGI OpenCL compiler. Starting with an OpenCL implementation of the SURF algorithm, we modified the compute kernels from their original GPU versions into a form more suitable for execution on a multi-core CPU. | Continue to the article
Introduction to Texture Memory Support in CUDA Fortran
CUDA C experiments have suggested that using texture memory can in some cases boost overall application performance on a GPU by 10% or more. Support for texture memory from within CUDA Fortran has been a regularly requested feature. PGI version 12.8 is the first release to includes support for adding texture target declaration to a CUDA Fortran module.
This article takes a first look at using texture memory from within CUDA Fortran. The article covers programming concepts and demonstrates the performance potential using example programs. | Continue to the article