PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Avoid reallocating memory on the GPU.
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Mon Jan 21, 2013 1:22 pm    Post subject: Reply with quote

Quote:
Are you saying that I need to have the data copied into the device before I can do the present clause?
It has to be allocated on the device, so using "create" is fine. Alternatively, you can use "present_or_copy" (or "pcopy") instead of "present". "pcopy" will look to see if the data's already been allocated on the device, otherwise it's allocated and copied over.

Quote:
If that's the case that would ruin this code because function1 always uses the same type of arrays with the same size but the data on the arrays are different from function call to function call.
Note that the arrays in the "present" clause don't have to be the same array each time function1 is called. Nor do the arrays need to start at the beginning. For example, you could create a large block of memory on the device and then pass pointers into this block and have present find the proper device data. So long as the entire array is on the device, "present" will allow it.

- Mat
Back to top
View user's profile
JPMN



Joined: 23 Oct 2012
Posts: 8

PostPosted: Tue Jan 22, 2013 8:56 am    Post subject: Reply with quote

mkcolg wrote:
It has to be allocated on the device, so using "create" is fine. Alternatively, you can use "present_or_copy" (or "pcopy") instead of "present". "pcopy" will look to see if the data's already been allocated on the device, otherwise it's allocated and copied over.


I understant all that, but the fact is that the present clause still doesn't find the arrays I created earlier with the data create clause, I already had that clause in my code when I asked this question. With all those clauses I still get the error:
Quote:
FATAL ERROR: data in PRESENT clause was not found: name=clP


And I really don't know why. I'll post the actual code:

My main function is something like this:

Code:

int main(int argc, char * argv[]){
/*Other code here*/
int         numGammaCats=(&modelSettings[0])->numGammaCats, numChars=(&modelSettings[0])->numChars;

/*Other code here*/

#pragma acc data create(tiPL[16*numGammaCats], clL[4*numChars*numGammaCats], tiPR[16*numGammaCats], clR[4*numChars*numGammaCats], clP[4*numChars*numGammaCats], tiPA[16*numGammaCats], clA[4*numChars*numGammaCats])
    {
for (chn=0; chn<numLocalChains; chn++)
      {
      
       
      curLnL[chn] = LogLike(chn);
      curLnPr[chn] = LogPrior(chn);
      for (i=0; i<numCurrentDivisions; i++)
         {
         if (modelSettings[i].gibbsGamma == YES)
            curLnL[chn] += GibbsSampleGamma (chn, i, seed);
         }
      }
    }
}


Where LogLike is the function I referred to as functionA.
This function calls another function called LaunchLogLikeForDivision.
This function calls one of two functions that are very similar and I refered as function1.

These are the following functions:

Code:
int CondLikeDown_NUC4_OpenACC (TreeNode *p, int division, int chain)

{
   int            c, h, i, j, k, shortCut, *lState=NULL, *rState=NULL;
   CLFlt         /**clL, *clR, *clP,*/ *pL, *pR/*, *tiPL, *tiPR*/;
   ModelInfo      *m;
   
    struct timeval start, stop;
   
   
    /*OpenACC auxiliary variables*/
    int numGammaCats=0, numChars=0/*, tiP_size=0, cl_size=0*/;
   
   m = &modelSettings[division];
   
   /* flip space so that we do not overwrite old cond likes */
   FlipCondLikeSpaceOACC (m, chain, p->index);
   
   /* find conditional likelihood pointers */
   clL = m->condLikes[m->condLikeIndex[chain][p->left->index ]];
   clR = m->condLikes[m->condLikeIndex[chain][p->right->index]];
   clP = m->condLikes[m->condLikeIndex[chain][p->index       ]];
   
   /* find transition probabilities */
   pL = m->tiProbs[m->tiProbsIndex[chain][p->left->index ]];
   pR = m->tiProbs[m->tiProbsIndex[chain][p->right->index]];
   
    /* find likelihoods of site patterns for left branch if terminal */
   shortCut = 0;

   

      case 0:
            //Count time
            gettimeofday(&start, NULL);
            numGammaCats=m->numGammaCats;
            numChars=m->numChars;
            /*tiP_size=16*numGammaCats*sizeof(CLFlt);
            cl_size=4*numChars*numGammaCats*sizeof(CLFlt);*/
       tiPL = pL;
       tiPR = pR;
           
            /*Check if we need memory allocation on the GPU*/
            if (firstTime==1)
                printf("firstTime -> TRUE\n");
            else
                printf("firstTime -> TRUE\n");
           
           
       
        #pragma acc data present(tiPL[16*numGammaCats], clL[4*numChars*numGammaCats], tiPR[16*numGammaCats], clR[4*numChars*numGammaCats], clP[4*numChars*numGammaCats])
        {
        #pragma acc update device(tiPL[16*numGammaCats], clL[4*numChars*numGammaCats], tiPR[16*numGammaCats], clR[4*numChars*numGammaCats])

        #pragma acc kernels loop independent gang(numGammaCats)
       for (k=0; k<numGammaCats; k++)
        {
            #pragma acc loop independent gang(numChars/numGammaCats) vector(NTHREADS)/*vector(128)*/
            for (c=0; c<numChars; c++)
            {
                #pragma acc loop independent vector(4) /*gang(numGammaCats), vector(128)*/
                for(i=0; i < 4; i++){
                    register int indice = k*numChars*4+c*4;
                    register int indice2 = k*16+i+4;
                    clP[indice+i]  =  (tiPL[indice2+AA]*clL[indice+A] + tiPL[indice2+AC]*clL[indice+C] + tiPL[indice2+AG]*clL[indice+G] + tiPL[indice2+AT]*clL[indice+T])*(tiPR[indice2+AA]*clR[indice+A] + tiPR[indice2+AC]*clR[indice+C] + tiPR[indice2+AG]*clR[indice+G] + tiPR[indice2+AT]*clR[indice+T]);
                   
                }
            }
        }
           
            #pragma acc update host(clP[4*numChars*numGammaCats])
        }
       
            firstTime=0;
            printf("firstTime= %d", firstTime);
            gettimeofday(&stop,NULL);
            timersub(&stop, &start, &stop);
            timeradd(&accumulator, &stop, &accumulator);
   
    return NO_ERROR;
   
}


And:

Code:
int CondLikeRoot_NUC4_OpenACC (TreeNode *p, int division, int chain)

{
   int            a, c, h, i, j, k, shortCut, *lState=NULL, *rState=NULL, *aState=NULL;
   CLFlt         /**clL, *clR, *clP, *clA,*/ *pL, *pR, *pA/*, *tiPL, *tiPR, *tiPA*/;
   ModelInfo      *m;
   
    /*OpenACC auxiliary variables*/
    int numGammaCats=0, numChars=0;
   
    struct timeval start, stop;
   
   m = &modelSettings[division];
   
   /* flip state of node so that we are not overwriting old cond likes */
   FlipCondLikeSpaceOACC (m, chain, p->index);
   
   /* find conditional likelihood pointers */
   clL = m->condLikes[m->condLikeIndex[chain][p->left->index ]];
   clR = m->condLikes[m->condLikeIndex[chain][p->right->index]];
    clP = m->condLikes[m->condLikeIndex[chain][p->index       ]];
    clA = m->condLikes[m->condLikeIndex[chain][p->anc->index  ]];
   
   /* find transition probabilities (or calculate instead) */
   pL = m->tiProbs[m->tiProbsIndex[chain][p->left->index ]];
   pR = m->tiProbs[m->tiProbsIndex[chain][p->right->index]];
   pA = m->tiProbs[m->tiProbsIndex[chain][p->index       ]];
   
   
   shortCut = 4;
 
            //Count time
            gettimeofday(&start, NULL);
            numGammaCats=m->numGammaCats;
            numChars=m->numChars;
            tiPL = pL;
            tiPR = pR;
            tiPA = pA;
            #pragma acc data present(tiPL[16*numGammaCats], clL[4*numChars*numGammaCats], tiPR[16*numGammaCats], clR[4*numChars*numGammaCats], tiPA[16*numGammaCats], clA[4*numChars*numGammaCats], clP[4*numChars*numGammaCats])
            {
            #pragma acc update device(tiPL[16*numGammaCats], clL[4*numChars*numGammaCats], tiPR[16*numGammaCats], clR[4*numChars*numGammaCats], tiPA[16*numGammaCats], clA[4*numChars*numGammaCats])
           
            #pragma acc kernels loop independent gang(numGammaCats)
            for (k=0; k<numGammaCats; k++)
         {
                #pragma acc loop independent gang(numChars/numGammaCats) vector(NTHREADS)/*vector(128)*/
                for (c=0; c<numChars; c++)
             {
                    #pragma acc loop independent vector(4) /*gang(numGammaCats), vector(128)*/
                    for (i=0; i<4; i++) {
                        register int indice = k*numChars*4+c*4;
                        register int indice2 = k*16+i+4;
                        clP[indice+i] =   (tiPL[indice2+AA]*clL[indice+A] + tiPL[indice2+AC]*clL[indice+C] + tiPL[indice2+AG]*clL[indice+G] + tiPL[indice2+AT]*clL[indice+T])
                        *(tiPR[indice2+AA]*clR[indice+A] + tiPR[indice2+AC]*clR[indice+C] + tiPR[indice2+AG]*clR[indice+G] + tiPR[indice2+AT]*clR[indice+T])
                        *(tiPA[indice2+AA]*clA[indice+A] + tiPA[indice2+AC]*clA[indice+C] + tiPA[indice2+AG]*clA[indice+G] + tiPA[indice2+AT]*clA[indice+T]);
                    }
            }
         }
            #pragma acc update host(clP[4*numChars*numGammaCats])
            }
            gettimeofday(&stop,NULL);
            timersub(&stop, &start, &stop);
            timeradd(&accumulator, &stop, &accumulator);
           
       
   
   return NO_ERROR;
   
}


These two last functions are on the mbopenacc.c file, the main, LogLike and LaunchLogLikeForDivision are on a seperate file. The arrays *clL, *clR, *clP, *tiPL, *tiPR, *clA, *tiPA are global variables declared on the mbopenacc.h file.

What am I doing wrong?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Wed Jan 23, 2013 8:23 am    Post subject: Reply with quote

Hi JPMN,

clp and the other pointers point into larger arrays. Since "present" just looks up where the device arrays are located by association with the host arrays, when you re-assign the pointer, they are no longer associated with the earlier "create". Hence what needs to be created on, or copied to, the device are the "m->condLikes" and "m->tiProbs" arrays.

- Mat
Back to top
View user's profile
JPMN



Joined: 23 Oct 2012
Posts: 8

PostPosted: Thu Jan 24, 2013 6:07 am    Post subject: Reply with quote

Hi mkcolg.

Thank you very much for your answer. So the problem here is that I can't re-assing the pointer that I passed in the create clause.

Is there any way in OpenACC to tell the compiler to transfer an array of name X of the CPU to those smaller arrays on the GPU (clp, etc.)?

Or the solution is to pass the entire larger array at the beginning and then tell the compiler wich part of the array it's going to access with the present clause? For example if I wanted to access only from position 30 to 40 I would do something like:
Code:
#pragma acc data present(tiProbs[30:40])

If this is the case then would I have to change the code on the CondLikeDown_NUC4_OpenACC and CondLikeRoot_NUC4_OpenACC? I mean these computations inside the for loop with these names would no longer work, right?
Code:
clP[indice+i]  =  (tiPL[indice2+AA]*clL[indice+A] + tiPL[indice2+AC]*clL[indice+C] + tiPL[indice2+AG]*clL[indice+G] + tiPL[indice2+AT]*clL[indice+T])*(tiPR[indice2+AA]*clR[indice+A] + tiPR[indice2+AC]*clR[indice+C] + tiPR[indice2+AG]*clR[indice+G] + tiPR[indice2+AT]*clR[indice+T]);

Would I have to change the names of clP and the other pointers?
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Thu Jan 24, 2013 2:40 pm    Post subject: Reply with quote

Ok, let try and back up a bit since you're not quite getting this.

At the time you use the data region in the main routine, you allocate memory on the device and also associate a host pointer (and it's range) with the device pointer (and it's range). When you use the present clause, the host pointer is used to determine which device pointer to use.

So what you need to do, is allocate the larger arrays on the device in the main data region. You then can assign the local host pointers (clp) to a location in the large arrays. You then would put "clp" in the present clause with it's range. Now clp's host pointer is used to find the same location of the copy of the large array on the device.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group