PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

Variables and constants on accelerator

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
nickaj



Joined: 06 Sep 2011
Posts: 21

PostPosted: Thu Oct 13, 2011 8:58 am    Post subject: Variables and constants on accelerator Reply with quote

I was recently trying to do a quick benchmark of a machine, comparing the execution speed between host and device. After some tinkering with the code, I came across the following.

If I have an accelerated loop such:
Code:

#pragma acc region for parallel
  for(j=0; j<nPoints; j++){
   
    zre = cre = csetre[j];
    zim = cim = csetim[j];
   
    for(i=0; i<MAXITER; i++){
        ztemp = (zre*zre - zim*zim) + cre;
        zim  = 2.0*zre*zim + cim;
        zre  = ztemp;
        if ((zre*zre + zim*zim) > CEILING) break;
     }
   
    if((zre*zre + zim*zim) < CEILING)
      numInside++;
  }

I get one answer. If I change MAXITER (which is #defined in the source to a value) for maxIter, an integer variable of the same value as MAXITER, I get a different answer. I'd come across this previously so compiled using

Code:

pgcc -o pgi_bug -O3 -ta=nvidia,cc20,nofma -Minfo=accel pgi_bug_producer.c -fastsse

...the nofma option sorting out the maths such that both device and host now give the same answer. The problem I now have is that my runtime for MAXITER is ~3 times that of maxIter.

Whilst it's not a problem at the moment, it'd be interesting to know why this happens.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6213
Location: The Portland Group Inc.

PostPosted: Fri Oct 14, 2011 12:37 pm    Post subject: Reply with quote

Hi nickaj,

What does the -Minfo=accel output say about how the two versions are scheduled?

- Mat
Back to top
View user's profile
nickaj



Joined: 06 Sep 2011
Posts: 21

PostPosted: Tue Oct 18, 2011 6:31 am    Post subject: Reply with quote

Here's the compiler outputs for both versions. My code does both MAXITER (constant) and maxIter (the integer variable version) in one hence the dual output. The MAXITER (constant) version is first.

normal (ie, pgcc ta=nvidia,cc20 ...)
Code:

    162, Generating copyin(csetre[0:nPoints-1])
         Generating copyin(csetim[0:nPoints-1])
         Generating compute capability 2.0 binary
    163, Loop carried scalar dependence for 'numInside' at line 183
         Accelerator kernel generated
        163, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
             CC 2.0 : 15 registers; 1032 shared, 76 constant, 0 local memory bytes; 100% occupancy
        183, Sum reduction generated for numInside
    171, Loop carried scalar dependence for 'zim' at line 173
         Loop carried scalar dependence for 'zim' at line 174
         Scalar last value needed after loop for 'zim' at line 182
         Loop carried scalar dependence for 'zre' at line 173
         Loop carried scalar dependence for 'zre' at line 174
         Scalar last value needed after loop for 'zre' at line 182
         Accelerator restriction: scalar variable live-out from loop: zre
         Accelerator restriction: scalar variable live-out from loop: zim
         Inner sequential loop scheduled on accelerator
    196, Generating copyin(csetre[0:nPoints-1])
         Generating copyin(csetim[0:nPoints-1])
         Generating compute capability 2.0 binary
    197, Loop carried scalar dependence for 'numInside' at line 217
         Accelerator kernel generated
        197, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
             CC 2.0 : 15 registers; 1032 shared, 76 constant, 0 local memory bytes; 100% occupancy
        217, Sum reduction generated for numInside
    205, Loop carried scalar dependence for 'zim' at line 207
         Loop carried scalar dependence for 'zim' at line 208
         Scalar last value needed after loop for 'zim' at line 216
         Loop carried scalar dependence for 'zre' at line 207
         Loop carried scalar dependence for 'zre' at line 208
         Scalar last value needed after loop for 'zre' at line 216
         Accelerator restriction: scalar variable live-out from loop: zre
         Accelerator restriction: scalar variable live-out from loop: zim
         Inner sequential loop scheduled on accelerator



And using nofma (pgcc -ta=nvidia,cc20,nofma)
Code:

    162, Generating copyin(csetre[0:nPoints-1])
         Generating copyin(csetim[0:nPoints-1])
         Generating compute capability 2.0 binary
    163, Loop carried scalar dependence for 'numInside' at line 183
         Accelerator kernel generated
        163, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
             CC 2.0 : 19 registers; 1032 shared, 76 constant, 0 local memory bytes; 100% occupancy
        183, Sum reduction generated for numInside
    171, Loop carried scalar dependence for 'zim' at line 173
         Loop carried scalar dependence for 'zim' at line 174
         Scalar last value needed after loop for 'zim' at line 182
         Loop carried scalar dependence for 'zre' at line 173
         Loop carried scalar dependence for 'zre' at line 174
         Scalar last value needed after loop for 'zre' at line 182
         Accelerator restriction: scalar variable live-out from loop: zre
         Accelerator restriction: scalar variable live-out from loop: zim
         Inner sequential loop scheduled on accelerator
    196, Generating copyin(csetre[0:nPoints-1])
         Generating copyin(csetim[0:nPoints-1])
         Generating compute capability 2.0 binary
    197, Loop carried scalar dependence for 'numInside' at line 217
         Accelerator kernel generated
        197, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
             CC 2.0 : 19 registers; 1032 shared, 76 constant, 0 local memory bytes; 100% occupancy
        217, Sum reduction generated for numInside
    205, Loop carried scalar dependence for 'zim' at line 207
         Loop carried scalar dependence for 'zim' at line 208
         Scalar last value needed after loop for 'zim' at line 216
         Loop carried scalar dependence for 'zre' at line 207
         Loop carried scalar dependence for 'zre' at line 208
         Scalar last value needed after loop for 'zre' at line 216
         Accelerator restriction: scalar variable live-out from loop: zre
         Accelerator restriction: scalar variable live-out from loop: zim
         Inner sequential loop scheduled on accelerator


-Nick.
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group