PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

copyin behavior change in 12.5?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
njustn



Joined: 09 Nov 2011
Posts: 22

PostPosted: Fri Jul 06, 2012 7:54 pm    Post subject: copyin behavior change in 12.5? Reply with quote

Hi, I upgraded to 12.5 today and found that 3/4 of the codes I've been working with written with PGI ACC directives would no longer compile. I'm not sure where in the chain this happened, I think the previous version in use was 12.3, either way, the error was this.

Code:
make -C kmeans/
make[1]: Entering directory `/home/njustn/omp-co-repo/kmeans'
pgcc -mp=allcores  -O3 -fast -Minfo=accel,mp  -DPGI -I/opt/pgi/linux86-64/2012/cuda/4.1/include -I/opt/pgi/linux86-64/2012/include_acc -ta=nvidia,keepgpu,keepptx,nofma -c99   -I. -I../common -c omp_main.c -o omp_main.o
main:
     68, Parallel region activated
     78, Parallel region terminated
pgcc -mp=allcores  -O3 -fast -Minfo=accel,mp  -DPGI -I/opt/pgi/linux86-64/2012/cuda/4.1/include -I/opt/pgi/linux86-64/2012/include_acc -ta=nvidia,keepgpu,keepptx,nofma -c99   -I. -I../common  -c omp_kmeans.c
omp_kmeans.001.gpu(43): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(43): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(43): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(59): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(59): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(59): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(75): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(75): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(75): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(76): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(76): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(91): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(91): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(91): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(91): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(91): error: expression must have arithmetic or enum type

omp_kmeans.001.gpu(105): error: expression must have integral or enum type

17 errors detected in the compilation of "/tmp/pgnvd_wob6Wbeavuz.nv0".
PGC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (omp_kmeans.c: 210)
omp_kmeans:
    148, Parallel region activated
         Parallel loop activated with static block schedule
    153, Barrier
         Parallel region terminated
    164, Generating copyin(fo[0:1])
    185, Parallel region activated
    210, Generating copyin(fc[0:numClusters*numCoords])
         Generating copyin(gte)
         Generating copyin(gts)
         Generating copyin(numClusters)
         Generating copyin(numObjs)
         Generating copyin(numCoords)
    225, Loop is parallelizable
         Accelerator kernel generated
        225, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
    238, Loop is parallelizable
    244, Loop carried scalar dependence for 'min_dist' at line 255
         Scalar last value needed after loop for 'index' at line 260
    249, Loop is parallelizable
    271, Parallel region terminated


After some confusion, I realized that the compiler had decided that all of my basic int variables were actually char * arrays for some reason. These are the relevant chunks of code.

Code:

float** omp_kmeans(int     is_perform_atomic, /* in: */                                                                                                       
                   float **objects,           /* in: [numObjs][numCoords] */                                                                                   
                   int     numCoords,         /* no. coordinates */                                                                                           
                   int     numObjs,           /* no. objects */                                                                                               
                   int     numClusters,       /* no. clusters */                                                                                               
                   float   threshold,         /* % objects change membership */                                                                               
                   int    *membership)        /* out: [numObjs] */
...
#pragma acc region for \                                                                                                                                       
                    deviceptr(data)\                                                                                                                           
                    deviceptr(cfo)\                                                                                                                           
                    private(index,i,j,k,dist,min_dist)\                                                                                                       
                    copyin(numCoords,numObjs,numClusters,gts,gte)\                                                                                             
                    copyin(fc[0:numClusters*numCoords])


The important point is the numCoords, numObjs, etc. int variables. When the copyin clause is removed, the code compiles successfully in 12.5, where both compile successfully with 11.10 (the other one I have immediate access to). Is this intentional behavior?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Mon Jul 09, 2012 11:43 am    Post subject: Reply with quote

Hi njustn,

Quote:
Is this intentional behavior?
Yes, and no. In order to support C pointer manipulation, our engineers completely revamped how pointers operate on the device. Hence the generic pointers, "char *", are expected, however the "expression must have arithmetic or enum type" was not. While we'd need to see your code to be sure, this message is very similar to another report (TPR#18694) which will be fixed in the next release.

If you can, please send a reproducing example of the error to PGI Customer Support (trs@pgroup.com) and ask then to forward it to me. I'll then confirm if its the same issue.

Thanks,
Mat
Back to top
View user's profile
njustn



Joined: 09 Nov 2011
Posts: 22

PostPosted: Tue Jul 10, 2012 6:57 pm    Post subject: Reply with quote

If you would like a complete application, I'll be happy to send one of the ones that failed along, but the build situation for my full applications is somewhat complicated at the moment, so it might be more trouble than it's worth for you. A minimal reproducing example is quite simple though, and copied here. The issue appears to be that pre-12.5 straight non-pointer values were identified and treated accordingly, now they are blindly treated as "char *" values in the generated cuda output, causing mathematical expressions to fail loudly.

Code:
int main(int argc, char * argv[]){
    int i=0, j=0;
#pragma acc region for copyin(argc)
    for(i=0; i<500000; i++){
        argv[i][0] *= argc;
    }
    return 0;
}
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6129
Location: The Portland Group Inc.

PostPosted: Wed Jul 11, 2012 8:01 am    Post subject: Reply with quote

Yep, this is the same problem. So assuming that this is identical your full program, It will be fixed in 12.6.

- Mat

Code:
% pgcc -acc njustin.c -V12.5 -Minfo=accel -Msafeptr
/tmp/pgaccbcigdvJW3WzI.gpu(20): error: expression must have arithmetic or enum type

1 error detected in the compilation of "/tmp/pgnvdOdig4-pDmu03.nv0".
PGC-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (jtin.c: 4)
main:
      4, Generating copyin(argc)
         Generating copy(argv[0:500000][0:1])
      5, Loop is parallelizable
         Accelerator kernel generated
          5, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
PGC/x86-64 Linux 12.5-0: compilation completed with warnings
% pgcc -acc njustin.c -V12.6 -Minfo=accel -Msafeptr
main:
      4, Generating copyin(argc)
         Generating copy(argv[0:500000][0:1])
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
      5, Loop is parallelizable
         Accelerator kernel generated
          5, #pragma acc for parallel /* blockIdx.x */
             CC 1.0 : 9 registers; 32 shared, 0 constant, 0 local memory bytes
             CC 2.0 : 12 registers; 0 shared, 48 constant, 0 local memory bytes
%
Back to top
View user's profile
njustn



Joined: 09 Nov 2011
Posts: 22

PostPosted: Sun Jul 15, 2012 2:37 pm    Post subject: Reply with quote

That's great news, thanks. I feel a great deal more comfortable when I have all of my variable copies explicitly listed out rather than trusting the automatic copies to behave as I expect.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group