PGI CUDA-x86

CUDA-x86

NVIDIA CUDA was developed to enable offloading computationally intensive kernels to massively parallel GPUs. Through API function calls and language extensions, CUDA gives developers explicit control over the mapping of general-purpose computational kernels to GPUs, as well as the placement and movement of data between an x86 processor and the GPU.

The PGI CUDA C/C++ compiler for x86 platforms allows developers using CUDA to compile and optimize their CUDA applications to run on x86-based workstations, servers and clusters with or without an NVIDIA GPU accelerator. When run on x86-based systems without a GPU, PGI CUDA C applications use multiple cores and the streaming SIMD (Single Instruction Multiple Data) capabilities of Intel and AMD CPUs for parallel execution.

PGI CUDA C/C++ for Multi-core x86

The PGI CUDA C/C++ compiler implements the current NVIDIA CUDA C language for GPUs, and it will continue to track closely the evolution of CUDA C moving forward. PGI CUDA C/C++ for x86 implementation is proceeding in phases:

Implementation Overview

The PGI CUDA C/C++ for x86 compiler processes CUDA C as a native parallel programming language for multi-core x86 including:

  • Inlining device kernel functions
  • Translating chevron syntax to parallel/vector loops
  • Using multiple cores and SSE/AVX instructions

At run-time, CUDA C programs compiled for x86 executes each CUDA thread block using a single host core, eliminating synchronization where possible. CUDA host code supports all PGI optimizations for Intel/AMD processors. As shown in the following table, well-structured CUDA C for multi-core x86 programs can approach the efficiency and performance of the same algorithm written using other parallel programming models such as OpenMP.

Additional Resources

FAQ


Q What is CUDA C for x86?

A The "PGI CUDA C for x86" compiler is a new tool that enables CUDA developers to deploy their applications on systems based on the industry-standard x86 architecture.

Q Why do this?

A Parallel application developers need flexibility. They want to be able to create and deploy their applications on a wide range of HPC systems. The new PGI CUDA C compiler enables developers to write parallel CUDA C applications that can run on x86 workstations, servers and clusters—with or without NVIDIA GPUs.

Q How is NVIDIA involved? Do they support it?

A NVIDIA announced their CUDA-x86 plans at the 2010 GPU Technology Conference in September.

Q Won't CUDA code run slower on an x86 cluster than it would on a GPU? Why would anybody want to do that?

A Supporting CUDA C on multi-core x86 allows developers to write parallel applications that can be used on systems with or without GPU accelerators. CUDA C is simply a parallel programming model and language. While it was designed with the structure required for efficient GPU programming, it also can be compiled for efficient execution on x86 using multiple cores and SSE/AVX to effect parallel execution. As an x86 parallel programming model, we believe CUDA performance should be able to reach levels comparable to other parallel programming models such as OpenMP.

Q Are executables built with the PGI compiler able at runtime to automatically split the workload to use both the CPU and GPU if present?

A Executables built with the PGI compiler are be able to run parallel CUDA C kernels on either multi-core x86 or NVIDIA GPUs, but there are no current plans to split kernels across CPU and GPU.

Q Will PGI also release a version of CUDA Fortran for x86?

A Yes. The intent is to provide the same capabilities in both CUDA C and CUDA Fortran.

Q Will all CUDA applications run on x86?

A PGI has no current plans to support the CUDA C Driver API, that would be very difficult to do. All CUDA C applications written using the Runtime API should run on multi-core x64 with the PGI compiler. It will take a few releases to reach a point where all NVIDIA CUDA C for GPU features (e.g. support for texture memeory) are implemented. Some low-level coding strategies that depend on specific features of the device (like warp size) are not portable.

Q Do I need to install NVIDIA CUDA software to use the PGI CUDA compilers?

A No. The necessary elements of the NVIDIA CUDA toolkit needed to compile and execute CUDA C programs (header files for example) are bundled with the PGI compilers. CUDA-x86 also includes a subset of the NVIDIA C CUDA SDK modified where necessary to run on x86.

Q Which processors does CUDA-x86 support?

A PGI compilers are optimized for performance on the latest 64-bit microprocessors from AMD and Intel. Currently, CUDA-x86 does not take advantage of all the performance capabilities available from these processors. The next release will optimized both the sequential and massively parallel components of CUDA applications for these same processors.

Q Which operating systems does PGI support with its CUDA-x86 compilers?

A Linux and OS X.

Q Which CUDA C features are missing from CUDA-x86?

A Most CUDA C capabilities are included in CUDA-x86. Support for some features will be added later while there are no plans to support others.

Q Is there also be a CUDA x64?

A Unfortunately, there is some misconception about what these two terms mean. All PGI CUDA compilers include support for both 32-bit and 64-bit platforms to the extent that support for those platforms is available from NVIDIA.

Q How much does it cost and what's your upgrade plan?

A PGI CUDA-x86 support is included in PGI's accelerator-enabled C/C++ compiler products (the PGI Accelerator products). PGI Accelerator licensees with a current PGI Subscription get PGI CUDA-x86 at no extra charge. Customers without GPU-enabled licenses can upgrade by paying just the difference in the license fee for a GPU-enabled version. Contact PGI sales for more information.

Click me