PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

How to produce OpenCL executable on an NVidia card?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Jing Li



Joined: 12 Dec 2013
Posts: 6

PostPosted: Fri Jun 13, 2014 6:43 am    Post subject: How to produce OpenCL executable on an NVidia card? Reply with quote

I started to use pgcc compiler lately, and I have 2 questions:
1. Is it possible to produce OpenCL code on an Nvidia card
2. Is it possible to acquire the kernel code that generated (.cu or .cl file) by pgcc?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Fri Jun 13, 2014 9:17 am    Post subject: Reply with quote

Hi Jing Li,

For NVIDIA devices, we only target CUDA C or LLVM. Though we target OpenCL or LLVM when targeting AMD device.

To see the generated device code, use the "keep" sub-option: "-ta=<target>:keep".

The kernels will be located in the "filename.*.gpu" files.

Hope this helps,
Mat

FYI, here's the current 14.6 list of "-ta" sub-options:

Code:
% pgfortran -help -ta
-ta=tesla:{[no]autocollapse|[no]fma|[no]flushz|keep|llvm|loadcache:{L1|L2}|[no]unroll|maxregcount:<n>|[no]rdc|[no]required|cc1x|tesla|cc1+|tesla+|cc2x|fermi|cc2+|fermi+|cc3x|kepler|cc3+|kepler+|fastmath|pin|cuda5.5|cuda6.0}|nvidia|radeon:{keep|llvm|[no]unroll|[no]required|tahiti|capeverde|spectre|buffercount:<n>}|host
                    Choose target accelerator
    tesla           Select NVIDIA Tesla accelerator target
     [no]autocollapse
                    Automatically collapse tightly nested loops
     [no]fma        Generate fused mul-add instructions (default at -O3)
     [no]flushz     Enable flush-to-zero mode on the GPU
     keep           Keep kernel files
     llvm           Use LLVM back end; disables cc1x
     loadcache      Choose what hardware level cache to use for global memory loads
      L1            Use L1 cache
      L2            Use L2 cache
     [no]unroll     Enable automatic inner loop unrolling (default at -O3)
     maxregcount:<n>
                    Set maximum number of registers to use on the GPU
     [no]rdc        Generate relocatable device code
     [no]required   Issue compiler error if the compute regions fail to accelerate
     cc1x|tesla     Compile for compute capability 1.x
     cc1+|tesla+    Compile for compute capability 1.x and above
     cc2x|fermi     Compile for compute capability 2.x
     cc2+|fermi+    Compile for compute capability 2.x and above (default)
     cc3x|kepler    Compile for compute capability 3.x
     cc3+|kepler+   Compile for compute capability 3.x and above
     fastmath       Use fast math library
     pin            Set default to pin host memory
     cuda5.5        Use CUDA 5.5 Toolkit compatibility
     cuda6.0        Use CUDA 6.0 Toolkit compatibility
    nvidia          nvidia is a synonym for tesla
    radeon          Select AMD Radeon GPU accelerator target
     keep           Keep kernel source files
     llvm           Use LLVM/SPIR back end
     [no]unroll     Enable automatic inner loop unrolling (default at -O3)
     [no]required   Issue compiler error if the compute regions fail to accelerate
     tahiti         Compile for Radeon Tahiti architecture (default)
     capeverde      Compile for Radeon Capeverde architecture
     spectre        Compile for Radeon Spectre architecture
     buffercount:<n>
                    Set max number of device buffers used by OpenCL kernel
    host            Compile for the host, i.e., no accelerator target
Back to top
View user's profile
Jing Li



Joined: 12 Dec 2013
Posts: 6

PostPosted: Sun Jun 15, 2014 1:36 am    Post subject: Reply with quote

Hi Mat,
Thanks for your quick response. So PGI compiler does not support generating OpenCL code that can run on a CUDA enabled device, am I correct on this?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5952
Location: The Portland Group Inc.

PostPosted: Mon Jun 16, 2014 8:03 am    Post subject: Reply with quote

Quote:
So PGI compiler does not support generating OpenCL code that can run on a CUDA enabled device, am I correct on this?
Correct. Given CUDA is available on NVIDIA device and the underlying OpenACC device code should transparent to the user, there was no reason to support OpenCL target generation.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group