PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

pgcc compile error in OpenACC-CUDA interoperabily example
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
i_alex2004



Joined: 18 Aug 2012
Posts: 8

PostPosted: Wed Dec 19, 2012 10:16 am    Post subject: pgcc compile error in OpenACC-CUDA interoperabily example Reply with quote

Hello, I have written a code to show OpenACC-CUDA interoperability and have some problems with compilation, could you please help me in finding problem?

Code:

#include <stdio.h>
#include <cuda.h>
#include <curand_kernel.h>

static const int N=100;

void vecAdd (float restrict *a, float restrict *b, float restrict*sum)
{
   #pragma acc kernels loop present (a[N], b[N], sum[N]);
   for (int i=0; i<N; i++)
      sum[i]=a[i]+b[i];
}

__global__ void setup_kernel ( curandState * state, unsigned long seed )
{
   int id = threadIdx.x + blockIdx.x * 64;
   if (id<N){
      //seed, sequence, offset, state
      curand_init ( seed, id, 0, &state[id] );
   }
}

__global__ void generate( curandState* globalState, float * array1, float * array2, float * array3 )
{
   int ind = threadIdx.x;
   if (ind <N){
      curandState localState = globalState[ind];
      float RANDOM = curand_uniform( &localState );
      array1[ind] = RANDOM;
      array2[ind] = (RANDOM+5)/123;
      array3[ind] = RANDOM+2;
      globalState[ind] = localState;
   }
}

int main()
{
         curandState* devStates;
   float * a, * b, * sum, * vec;
   
   cudaMalloc ( &a, N*sizeof(float));
   cudaMalloc ( &b, N*sizeof(float));
   cudaMalloc ( &sum, N*sizeof(float));
   cudaMalloc ( &vec, N*sizeof(float));
   cudaMalloc ( &devStates, N*sizeof( curandState ));

   setup_kernel <<< N/256+1, 256 >>> (a, time(NULL));
   generate <<< N/256+1, 256 >>> ( devStates, a, b );

   #pragma acc declare device_resident (a[N], b[N], sum[N], vec[N])
   vecAdd (a,b,sum);

   cublasInit();
   cublasSaxpy(N, 2.0, sum, 2, vec, 1);
   cublasShutdown();
   
   #pragma acc host_data use_device (sum);
      
   for (int i=0; i<N; i++)
      printf(sum[i],"\n");
   return 0;
}


I compile it with:
Code:

pgcc -acc -I/opt/pgi/linux86-64/2012/cuda/4.2/include -Minfo=accel -L /opt/pgi/linux86-64/2012/cuda/4.2/lib64 -lcurand -ta=nvidia interop.c


And recieve the following error:

Code:

PGC-F-0249-#error --  --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! --- (/opt/pgi/linux86-64/2012/cuda/4.2/include/host_defines.h: 128)
PGC/x86-64 Linux 12.10-0: compilation aborted


Thanks a lot!
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6639
Location: The Portland Group Inc.

PostPosted: Wed Dec 19, 2012 10:32 am    Post subject: Reply with quote

Hi i_alex2004,

Unfortunately, NVIDIA hasn't updated their header files to allow pgcc to compile them. Also, pgcc doesn't support CUDA C extensions. Our C++ compiler, pgcpp, does but only when targeting x86, not NVIDIA GPUs.

What this means is that you need to compile your CUDA C code with nvcc, and OpenACC code with PGI, and the two can't be mixed in the same file. The objects and device pointers are interoperable.

Hope this helps,
Mat
Back to top
View user's profile
millad



Joined: 02 Jun 2016
Posts: 2

PostPosted: Mon Jun 13, 2016 11:34 am    Post subject: Reply with quote

Hi mkcolg,

Is there any updates on this feature?

I also want to do some interoperability thing between CUDA and OpenACC but since I am compiling a big package with a CMake, I can't specify compilers (nvcc or pgcc/pgc++) per source file. Also, there are some files in the package that are suffixed as .c or .cpp but they are using CUDA features.

So, in a nutshell, I need interoperability feature of PGI compiler but I need the compiler to recognize it and do it manually.

Do you know whether there is a plan for this or not?

I am using PGI Compiler 16.5 (trial) and NVCC 7.0/7.5.


Regards,
Millad
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6639
Location: The Portland Group Inc.

PostPosted: Mon Jun 13, 2016 1:51 pm    Post subject: Reply with quote

Hi Millad,

To intermix both CUDA 7.5 and OpenACC in the same source file, use nvcc as the command line compiler and pgc++ as the host compiler.

Note that CUDA 7.5 contains an error in the "/opt/cuda-7.5/include/host_config.h" header file where it restricts usage to just PGI 15.4. You will need to edit the file at line 87 to remove this check.
Change:
Code:

#if __PGIC__ != 15 || __PGIC_MINOR__ != 4 || !defined(__GNUC__) || !defined(__LP64__)

#error -- unsupported pgc++ configuration! Only pgc++ 15.4 on Linux x86_64 is supported!

#endif /
to
Code:
#if !defined(__GNUC__) || !defined(__LP64__)

#error -- unsupported pgc++ configuration! Only pgc++ 15.4 on Linux x86_64 is supported!


Here's an example which I derived from the CUDA vectorAdd sample:

Code:
% cat vectorAdd.cpp
#include <stdio.h>
#include <cuda_runtime.h>
__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{
    int i = blockDim.x * blockIdx.x + threadIdx.x;

    if (i < numElements)
    {
        C[i] = A[i] + B[i];
    }
}

/**
 * Host main routine
 */
int
main(void)
{
    cudaError_t err = cudaSuccess;
    int numElements = 50000;
    size_t size = numElements * sizeof(float);
    printf("[Vector addition of %d elements]\n", numElements);
    float *h_A = (float *)malloc(size);
    float *h_B = (float *)malloc(size);
    float *h_C = (float *)malloc(size);
    for (int i = 0; i < numElements; ++i)
    {
        h_A[i] = rand()/(float)RAND_MAX;
        h_B[i] = rand()/(float)RAND_MAX;
    }

#pragma acc data copyin(h_A[0:numElements],h_B[0:numElements]), copyout(h_C[0:numElements])
{
#pragma acc host_data use_device(h_A,h_B,h_C)
{
    // Launch the Vector Add CUDA Kernel
    int threadsPerBlock = 256;
    int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;
    printf("CUDA kernel launch with %d blocks of %d threads\n", blocksPerGrid, threadsPerBlock);
    vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(h_A, h_B, h_C, numElements);
    err = cudaGetLastError();
    if (err != cudaSuccess)
    {
        fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!\n", cudaGetErrorString(err));
        exit(EXIT_FAILURE);
    }
} // end host_data
} // end acc data region
    for (int i = 0; i < numElements; ++i)
    {
        if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5)
        {
            fprintf(stderr, "Result verification failed at element %d!\n", i);
            exit(EXIT_FAILURE);
        }
    }

    printf("Test PASSED\n");

    // Free host memory
    free(h_A);
    free(h_B);
    free(h_C);

    printf("Done\n");
    return 0;
}

% nvcc -x cu -ccbin pgc++ -Xcompiler " -w -ta=tesla:cuda7.5 -Mcuda -V16.5 -Minfo=accel" vectorAdd.cpp
main:
      1, include "tmpxft_00008abb_00000000-4_vectorAdd.cudafe1.cpp"
           3, include "vectorAdd.cu"
               60, include "device_types.h"
                    69, include "builtin_types.h"
                         65, include "host_runtime.h"
                              72, include "stddef.h"
                                  214, include "driver_types.h"
                                      1445, include "surface_types.h"
                                            114, include "texture_types.h"
                                                 208, include "vector_types.h"
                                                       32, include "vectorAdd.cu"
                                                            34, Generating copyin(h_A[:numElements],h_B[:numElements])
                                                                Generating copyout(h_C[:numElements])
% a.out
[Vector addition of 50000 elements]
CUDA kernel launch with 196 blocks of 256 threads
Test PASSED
Done


Hope this helps,
Mat
Back to top
View user's profile
millad



Joined: 02 Jun 2016
Posts: 2

PostPosted: Mon Jun 13, 2016 3:37 pm    Post subject: Reply with quote

Thanks Mat for reply.

It worked. I was able to compile the file that I had a problem with.

However, when I try to compile the whole package by setting nvcc as the main compiler for C/C++ files, CMake gives an error to me that CMAKE_C_COMPILER is set to a C++ compiler. I don't think that a C version of nvcc exists, right?

Actually, I am trying to compile GROMACS with PGI. The last version of it relies on CMake to build the system.

I also have a problem on how to pass argument after -Xcompiler option when using CMake. Since it is in double quotation, I am worried about how to pass it to -DCMAKE_C_FLAGS variable of CMake.

P.S.: thanks for the hint on "host_config.h". It helped. I hesitate to change header files like since I think that I might be the one that handles the situation wrong and not the global header files like host_config.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group