PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Avoid reallocating memory on the GPU.
Goto page Previous  1, 2, 3
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
JPMN



Joined: 23 Oct 2012
Posts: 8

PostPosted: Fri Jan 25, 2013 9:25 am    Post subject: Reply with quote

Ok, let's see if I got it this time.

I need to allocate the larger arrays in the main part and I've got this:
Code:
int main(int argc, char * argv[]){
/*Other code here*/
int         numGammaCats=(&modelSettings[0])->numGammaCats, numChars=(&modelSettings[0])->numChars;
ModelInfo      *m = &modelSettings[0];
int tiProbs_x=m->numTiProbs, tiProbs_y=m->tiProbLength, condLikes_x=m->numCondLikes, condLikes_y=m->condLikeLength;

/*Other code here*/

tiProbs_gpu=m->tiProbs;
condLikes_gpu=m->condLikes;
#pragma acc data copyin(tiProbs_gpu[tiProbs_x][tiProbs_y]), copy(condLikes_gpu[condLikes_x][condLikes_y])
    {
for (chn=0; chn<numLocalChains; chn++)
      {
       
       
      curLnL[chn] = LogLike(chn);
      curLnPr[chn] = LogPrior(chn);
      for (i=0; i<numCurrentDivisions; i++)
         {
         if (modelSettings[i].gibbsGamma == YES)
            curLnL[chn] += GibbsSampleGamma (chn, i, seed);
         }
      }
    }
}


Then I should assign the clP and the other variables to parts of the larger arrays, so I do this:
Code:
/* find conditional likelihood pointers */
   clL = condLikes_gpu[m->condLikeIndex[chain][p->left->index ]];
   clR = condLikes_gpu[m->condLikeIndex[chain][p->right->index]];
   clP = condLikes_gpu[m->condLikeIndex[chain][p->index       ]];
   
   /* find transition probabilities */
   pL = tiProbs_gpu[m->tiProbsIndex[chain][p->left->index ]];
   pR = tiProbs_gpu[m->tiProbsIndex[chain][p->right->index]];


And the two functions look like this:
Code:
int CondLikeDown_NUC4_OpenACC (TreeNode *p, int division, int chain)

{
   int            c, h, i, j, k, shortCut, *lState=NULL, *rState=NULL;
   CLFlt         /**clL, *clR, *clP,*/ *pL, *pR/*, *tiPL, *tiPR*/;
   ModelInfo      *m;
   
    struct timeval start, stop;
   
   
    /*OpenACC auxiliary variables*/
    int numGammaCats=0, numChars=0/*, tiP_size=0, cl_size=0*/;
   
   m = &modelSettings[division];
   
   /* flip space so that we do not overwrite old cond likes */
   FlipCondLikeSpaceOACC (m, chain, p->index);
   
   /* find conditional likelihood pointers */
   clL = condLikes_gpu[m->condLikeIndex[chain][p->left->index ]];
   clR = condLikes_gpu[m->condLikeIndex[chain][p->right->index]];
   clP = condLikes_gpu[m->condLikeIndex[chain][p->index       ]];
   
   /* find transition probabilities */
   pL = tiProbs_gpu[m->tiProbsIndex[chain][p->left->index ]];
   pR = tiProbs_gpu[m->tiProbsIndex[chain][p->right->index]];
   
    /* find likelihoods of site patterns for left branch if terminal */
   shortCut = 0;

   

      case 0:
            //Count time
            gettimeofday(&start, NULL);
            numGammaCats=m->numGammaCats;
            numChars=m->numChars;
            /*tiP_size=16*numGammaCats*sizeof(CLFlt);
            cl_size=4*numChars*numGammaCats*sizeof(CLFlt);*/
       tiPL = pL;
       tiPR = pR;
           
            /*Check if we need memory allocation on the GPU*/
            if (firstTime==1)
                printf("firstTime -> TRUE\n");
            else
                printf("firstTime -> TRUE\n");
           
           
       
        #pragma acc data present(tiPL[16*numGammaCats], clL[4*numChars*numGammaCats], tiPR[16*numGammaCats], clR[4*numChars*numGammaCats], clP[4*numChars*numGammaCats])
        {
        #pragma acc kernels loop independent gang(numGammaCats)
       for (k=0; k<numGammaCats; k++)
        {
            #pragma acc loop independent gang(numChars/numGammaCats) vector(NTHREADS)/*vector(128)*/
            for (c=0; c<numChars; c++)
            {
                #pragma acc loop independent vector(4) /*gang(numGammaCats), vector(128)*/
                for(i=0; i < 4; i++){
                    register int indice = k*numChars*4+c*4;
                    register int indice2 = k*16+i+4;
                    clP[indice+i]  =  (tiPL[indice2+AA]*clL[indice+A] + tiPL[indice2+AC]*clL[indice+C] + tiPL[indice2+AG]*clL[indice+G] + tiPL[indice2+AT]*clL[indice+T])*(tiPR[indice2+AA]*clR[indice+A] + tiPR[indice2+AC]*clR[indice+C] + tiPR[indice2+AG]*clR[indice+G] + tiPR[indice2+AT]*clR[indice+T]);
                   
                }
            }
        }
           
        }
       
            firstTime=0;
            printf("firstTime= %d", firstTime);
            gettimeofday(&stop,NULL);
            timersub(&stop, &start, &stop);
            timeradd(&accumulator, &stop, &accumulator);
   
    return NO_ERROR;
   
}


And this:
Code:
int CondLikeRoot_NUC4_OpenACC (TreeNode *p, int division, int chain)

{
   int            a, c, h, i, j, k, shortCut, *lState=NULL, *rState=NULL, *aState=NULL;
   CLFlt         /**clL, *clR, *clP, *clA,*/ *pL, *pR, *pA/*, *tiPL, *tiPR, *tiPA*/;
   ModelInfo      *m;
   
    /*OpenACC auxiliary variables*/
    int numGammaCats=0, numChars=0;
   
    struct timeval start, stop;
   
   m = &modelSettings[division];
   
   /* flip state of node so that we are not overwriting old cond likes */
   FlipCondLikeSpaceOACC (m, chain, p->index);
   
  /* find conditional likelihood pointers */
  clL = condLikes_gpu[m->condLikeIndex[chain][p->left->index ]];
  clR = condLikes_gpu[m->condLikeIndex[chain][p->right->index]];
  clP = condLikes_gpu[m->condLikeIndex[chain][p->index       ]];
  clA = condLikes_gpu[m->condLikeIndex[chain][p->anc->index  ]];
   
   /* find transition probabilities (or calculate instead) */
  pL = tiProbs_gpu[m->tiProbsIndex[chain][p->left->index ]];
  pR = tiProbs_gpu[m->tiProbsIndex[chain][p->right->index]];
  pA = tiProbs_gpu[m->tiProbsIndex[chain][p->index       ]];
   
   
   shortCut = 4;
 
            //Count time
            gettimeofday(&start, NULL);
            numGammaCats=m->numGammaCats;
            numChars=m->numChars;
            tiPL = pL;
            tiPR = pR;
            tiPA = pA;
            #pragma acc data present(tiPL[16*numGammaCats], clL[4*numChars*numGammaCats], tiPR[16*numGammaCats], clR[4*numChars*numGammaCats], tiPA[16*numGammaCats], clA[4*numChars*numGammaCats], clP[4*numChars*numGammaCats])
            {
           
            #pragma acc kernels loop independent gang(numGammaCats)
            for (k=0; k<numGammaCats; k++)
         {
                #pragma acc loop independent gang(numChars/numGammaCats) vector(NTHREADS)/*vector(128)*/
                for (c=0; c<numChars; c++)
             {
                    #pragma acc loop independent vector(4) /*gang(numGammaCats), vector(128)*/
                    for (i=0; i<4; i++) {
                        register int indice = k*numChars*4+c*4;
                        register int indice2 = k*16+i+4;
                        clP[indice+i] =   (tiPL[indice2+AA]*clL[indice+A] + tiPL[indice2+AC]*clL[indice+C] + tiPL[indice2+AG]*clL[indice+G] + tiPL[indice2+AT]*clL[indice+T])
                        *(tiPR[indice2+AA]*clR[indice+A] + tiPR[indice2+AC]*clR[indice+C] + tiPR[indice2+AG]*clR[indice+G] + tiPR[indice2+AT]*clR[indice+T])
                        *(tiPA[indice2+AA]*clA[indice+A] + tiPA[indice2+AC]*clA[indice+C] + tiPA[indice2+AG]*clA[indice+G] + tiPA[indice2+AT]*clA[indice+T]);
                    }
            }
         }
            }
            gettimeofday(&stop,NULL);
            timersub(&stop, &start, &stop);
            timeradd(&accumulator, &stop, &accumulator);
           
       
   
   return NO_ERROR;
   
}


However I still get the runtime error:
Quote:
FATAL ERROR: data in PRESENT clause was not found: name=clP


So what am I doing wrong or not getting this time?
By the way the variabes condLikes_gpu and tiProbs_gpu are global on the file mbopenacc.h.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6134
Location: The Portland Group Inc.

PostPosted: Mon Jan 28, 2013 12:07 pm    Post subject: Reply with quote

Quote:
So what am I doing wrong or not getting this time?
You have the idea correct. My guess what's happening here is that in the outer main data region, "tiProbs_gpu" and "condLikes_gpu" data are copies of the member arrays from the first element of the "modelSettings" array. However, in your routines, your using the arrays from "modelSettings[division]". Unless "division" is zero, this is not the same data was copied in the data region.

What would work well here would be an unstructured data region. So instead of a well defined start and stop of the region, you could call some initialize routine in a loop and use data region for every struct element's member arrays. Since only the addresses are associated, the name of the particular array would be irrelevant. However, this feature is still just a proposal in the OpenACC 2.0 specification (http://www.openacc.org/sites/default/files/Proposed%20Additions%20for%20OpenACC%202.pdf) so wont be implemented till later this year.

In the meantime, you may need to take the performance hit and copy the member arrays each time they are needed or reorganize your data into coalesced blocks (i.e. convert your array of structs to a struct of arrays.)

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3
Page 3 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group