PGI Accelerator Compilers with OpenACC Directives

Overview

PGI Accelerator Compilers

Using PGI Accelerator™ compilers, programmers can accelerate applications on x64+accelerator platforms by adding OpenACC compiler directives to existing high-level standard-compliant Fortran, C and C++ programs and then recompiling with appropriate compiler options.

Sample Fortran matrix multiplication loop, tagged to be compiled for an accelerator.

!$acc kernels 
      do k = 1,n1
       do i = 1,n3
        c(i,k) = 0.0
        do j = 1,n2
         c(i,k) = c(i,k) + a(i,j) * b(j,k)
        enddo
       enddo
      enddo
!$acc end kernels
	

PGI 2010 and later releases include the PGI Accelerator Fortran and C99 compilers supporting x64+NVIDIA systems running under Linux, Mac OS X and Windows. PGI introduced support for OpenACC directives with Release 2012 version 12.6 of the PGI Accelerator compilers and support for C++ was added with Release 2013.

PGI Accelerator compilers are supported on all Intel and AMD x64 processor-based systems with CUDA-enabled NVIDIA GPUs.

How They Work

Until now, developers targeting HPC accelerators have had to rely on language extensions to their programs. x64+accelerators programmers have been required to program at a detailed level including a need to understand and specify data usage information and manually construct sequences of calls to manage all movement of data between the x64 host and the accelerator.

OpenACC

The PGI Accelerator compilers automatically analyze whole program structure and data, split portions of the application between the x64 host CPU and the accelerator device as specified by a standard set of user directives, and define and generate an optimized mapping of loops to automatically use the parallel cores, hardware threading capabilities and SIMD vector capabilities of modern accelerators. In addition to directives and pragmas that specify regions of code or functions to be accelerated, other directives give the programmer fine-grained control over the mapping of loops, allocation of memory, and optimization for the accelerator memory hierarchy. The PGI Accelerator compilers generate unified object files and executables that manage all movement of data to and from the accelerator while leveraging all existing host-side utilities—linker, librarians, makefiles—and require no changes to the existing standard HPC Linux/x64 programming environment.

Resources

FAQ

Please also see the PGI Accelerator Programming user forum for additional questions and answers.


Q Which programming languages do the PGI Accelerator compilers support?

A PGI supports accelerators from within the PGFORTRAN™ Fortran 2003, PGCC® ANSI C99 and PGC++® gnu-compatible C++ compilers.

Q On which operating systems do PGI Accelerator compilers run?

A PGI 2011 and later releases include support for 64-bit and 32-bit Linux, Windows and Mac OS X.

Q Which accelerators can be targeted by PGI Accelerator compilers?

A PGI Accelerator compilers target all CUDA-enabled NVIDIA GPU accelerators with compute capability 1.0 or higher.

Q Do I need to install any 3rd party software?

A To use NVIDIA CUDA-enable GPUs, you must first install the CUDA driver for your system. All other necessary 3rd party software is included in the PGI installation package.

Q Does the compiler support IEEE standard-floating point arithmetic?

A The GPU accelerators available today support most of the IEEE floating-point standard. However, they do not support all the rounding modes, and some operations, notably square root, exponential, logarithm, and other transcendental functions, may not deliver full precision results. This is a hardware limitation that compilers cannot overcome.

Q Do PGI Accelerator compilers support double-precision?

A Yes.

Q Can I call a CUDA kernel function from my PGI compiled code?

A PGI is working on the design of a feature to allow you to call kernel functions written in CUDA or PTX or other languages directly from your C or Fortran program. We will announce this feature when it is available.

Q Does the compiler support two or more accelerators in the same program?

A As with CUDA, you can use two or more GPUs by using multiple threads, where each thread attaches to a different GPU and runs its kernels on that GPU. The current release does not include support to automatically control two or more GPUs from the same accelerator region.

Q Why does PGI support OpenACC instead of focusing solely on the PGI Accelerator model?

A The PGI Accelerator model has been used successfully on NVIDIA GPUs by many developers. PGI explicitly and carefully designed our model to be portable across device types, and specifically did not put the "PGI" name in the directives. From the very first, we were thinking and planning towards standardizing the model.

Several system suppliers now produce products using GPUs as accelerators. More scientists and ISVs will be more willing to adopt a model standardized across compiler vendors than one only supported by a single vendor, regardless of how well designed and supported it may be. OpenACC is that model. More users will drive innovation among compiler vendors, which will benefit all users.

Q Will PGI be dropping supporting for the PGI Accelerator directive syntax?

A PGI has no plans to drop support for PGI Accelerator syntax.

Q Can I run my program on a machine that doesn't have an accelerator on it?

A Yes. PGI Accelerator compilers can generate PGI Unified Binary™ technology executables that work in the presence or absence of an accelerator.

Q Do I have to rebuild my application for each different model GPU?

A The GPU code generated uses the same technology that is used for graphics applications and games; that is, the program uses a portable intermediate format which is then dynamically translated and re-optimized at run time by the drivers supplied by the vendor for the particular model of GPU in your machine. This preserves your investment by allowing your programs to continue to work even when you upgrade your GPU card, or use your program on a machine with a different model of GPU.

Q Can I use function or procedure calls in my GPU code?

A Current GPUs do not support function calls. The compiler will support function calls only if they can be inlined.

Q In what timeframe will PGI be including OpenMP TR1 support?

A We do not believe OpenMP TR1 can be efficiently supported on target accelerators such as NVIDIA, AMD or other GPUs. The OpenMP accelerator subcommittee is working to modify the proposed TR1 directives and model to broaden its applicability. We hope this will be successful. PGI will continue to aggressively support OpenMP directives in our products, but we have no plans to add the OpenMP accelerator extensions until they are accepted into the OpenMP standard and only if they can be implemented efficiently across the broad range of HPC accelerator targets.

Q When will you support <my favorite feature> in your compiler?

A Some features cannot be supported due to limitations of the hardware. Other features are not being supported because they would not deliver satisfactory performance. Still other features are planned for future implementation. Your feedback can affect our priorities.

Q Which OpenACC directives are supported in which release?

A Following is a list of OpenACC 1.0 features and the PGI Release 2012 version (12.x) that they were added.

Feature Version Feature Version
!$acc kernels 12.3 !$acc declare 12.3
clauses: clauses:
if() 12.3 copy()/copyin() 12.3
async() 12.3 copyin()/copyout() 12.3
copy() 12.3 create() 12.3
copyin() 12.3 present() 12.3
copyout() 12.3 present_or_copy() 12.3
create() 12.3 present_or_copyin() 12.3
present() 12.3 present_or_copyout() 12.3
present_or_copy() 12.3 present_or_create() 12.3
present_or_copyin() 12.3 device_resident() 12.6
present_or_copyout() 12.3 deviceptr() 12.6
present_or_create() 12.3
deviceptr() 12.3 !$acc update 12.3
clauses:
!$acc parallel 12.5 if() 12.3
clauses: async() 12.3
if() 12.5
async() 12.5 !$acc cache 12.6
num_gangs() 12.5  
num_workers() 12.6 !$acc host_data --
vector_length() 12.5  
reduction() 12.6 !$acc wait 12.3
copyin() 12.5  
copyout() 12.5 Runtime routines:
create() 12.5 openacc module 12.3
present() 12.6 openacc.h C hdr file 12.3
present_or_copy() 12.6 openacc_lib.h Ftn hdr file 12.3
present_or_copyin() 12.6  
present_or_copyout() 12.6 acc_get_num_devices() 12.3
present_or_create() 12.6 acc_set_device_type() 12.3
deviceptr() 12.6 acc_get_device_type() 12.3
private() 12.6 acc_set_device_num() 12.3
firstprivate() -- acc_get_device_num() 12.3
acc_async_test() 12.3
!$acc data 12.3 acc_async_test_all() 12.3
clauses acc_async_wait() 12.3
if() 12.3 acc_async_wait_all() 12.3
async() 12.3 acc_init() 12.3
copy() 12.3 acc_shutdown() 12.3
copyin() 12.3 acc_on_device() 12.3
create() 12.3 acc_malloc() for C 12.3
present() 12.3 acc_free() for C 12.3
present_or_copy() 12.3  
present_or_copyin() 12.3 Preprocessing:
present_or_copyout() 12.3 _OPENACC 12.3
present_or_create() 12.3  
deviceptr() in C 12.3 Environment variables:
deviceptr() in Ftn -- ACC_DEVICE_TYPE 12.3
  ACC_DEVICE_NUM 12.3
!$acc loop 12.3  
clauses: PGI Extensions:
collapse() 12.6 acc_copyin 12.6
within kernels region acc_copyout 12.6
gang() 12.5 acc_create 12.6
worker() 12.5 acc_delete 12.6
vector() 12.5 acc_update_host 12.6
seq() 12.3 acc_update_device 12.6
private() 12.3 acc_updatein 12.6
reduction() 12.6 acc_updateout 12.6
within parallel region acc_ispresent 12.6
gang 12.6 acc_deviceptr 12.6
worker 12.6    
vector 12.6    

Q How much does it cost?

A License pricing for the PGI Accelerator compilers can be found in the pricing section. If you are a current PGI licensee, you may upgrade your license in accordance with PGI's standard product upgrade policy.

Q How can I try it?

A To try out the PGI Accelerator compilers, follow these three steps:

  1. Download any of the available software packages for your operating system.
  2. Review the PGI Installation Guide or the PGI Visual Fortran Installation Guide and configure your environment.
  3. Obtain license keys. Available options include:
    1. You have a current PGI subscription—you will need to retrieve your upgraded permanent license keys.
    2. Your PGI subscription has expired—you can either generate 15 day trial keys as outlined in option 3 below, or you can bring your subscription current and gain access to the accelerator feature through updated permanent license keys.
    3. You don't have a PGI license—you can generate 15 day trial license keys. The trial keys and all executable files compiled using them will cease operating at the end of the 15 day trial period.

Please contact PGI Sales for exchange, upgrade or subscription renewal information.

Click me