PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Timing CUDA-x86 binary

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
sergioz



Joined: 02 Apr 2012
Posts: 2

PostPosted: Thu Apr 05, 2012 4:53 am    Post subject: Timing CUDA-x86 binary Reply with quote

Hi,

I'm trying to compare our CUDA implementation of a neural network algorithm with a multiCPU implementation generated by CUDA-x86 compiler. I compiled it without problems but when i execute the generated binary i obtain some strange values on timing results, for example:

i put some printf to show in each iteration how long does it take to reduce the network and sometimes it print the result and sometimes not, what timing library would you recommend me to measure performance of a multiCPU binary generate by your CUDA-x86 compiler,

Many thanks in advance, Sergio
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Thu Apr 05, 2012 2:09 pm    Post subject: Reply with quote

Hi Sergio,

Quote:
put some printf to show in each iteration how long does it take to reduce the network and sometimes it print the result and sometimes not
Sorry, I'd need an example to better understand what you're doing. Though perhaps you're not blocking between the kernel and the timing call? Kernels are launched asynchronously so can create problems when trying to use host side timers.

Quote:
what timing library would you recommend me to measure performance of a multiCPU binary generate by your CUDA-x86 compiler,
I typically use CUDA events to do internal timings of such things ask kernel execution and data movement.

- Mat
Back to top
View user's profile
sergioz



Joined: 02 Apr 2012
Posts: 2

PostPosted: Fri Apr 06, 2012 4:23 am    Post subject: Reply with quote

Thanks for your reply,

I've tried using Cuda events and also i obtain the same strange behaviour, only print some iterations over a loop,

Code:

function c++

        cudaEvent_t start,stop;
   cudaEventCreate(&start);
   cudaEventCreate(&stop);
   cudaEventRecord(start,0);
   
    unsigned int bytes = size * sizeof(Neurona);
    cutilSafeCallNoSync( cudaMemcpy(d_idata, h_idata, bytes, cudaMemcpyHostToDevice) );

   float COEF_GANADORA=(float)0.1;
    float COEF_VECINAS=(float)0.01;
   int UMBRAL_DATOS=127;

   cudaMemcpyToSymbol("COEF_GANADORA",    &COEF_GANADORA,    sizeof(float));
   cudaMemcpyToSymbol("COEF_VECINAS",    &COEF_VECINAS,    sizeof(float));

   int aux;
   if( isPow2(size) )
   {
      cudaMemcpyToSymbol("size",    &size,    sizeof(size));
   }else
   {
      aux = nextPow2(size);
      cudaMemcpyToSymbol("size",    &aux,    sizeof(aux));   
   }
   
   ....
        ....

   for (entrada=0;entrada<NUM_ENTRADAS;entrada++)
   {
      al=rand()%(NUM_PUNTOS);

      x=nube_puntos[al*3];
      y=nube_puntos[al*3+1];
      z=nube_puntos[al*3+2];

      reduceMinNeurona3_Min2<<<dimGrid>>>(d_idata, d_odata, x, y, z);

      ajustarPesosGanadora<<<1>>>(d_matVecinas, d_idata, d_odata, numBlocks,x,y, z);
   }

   cudaMemcpy(h_matVecinas, d_matVecinas, sizeof(auxVECINA)*((size*100)+size), cudaMemcpyDeviceToHost);

   cutilSafeCall( cudaMemcpy(h_idata, d_idata, bytes, cudaMemcpyDeviceToHost) );
   
   float elapsedTime;
   cudaEventRecord( stop,0 );
   cudaEventSynchronize(stop);
   cudaEventElapsedTime(&elapsedTime, start,stop);
   printf("Time with %d neurons %3.1f ms \n",size, elapsedTime);


this function is executed over a loop about M times and only is printed the messae sometimes and vary in every execution, its so strange, im compiling with these commands:

Code:

pgcpp -Mcudax86  -m64 -o ../../bin//release/cuGNG_base obj/x86_64/release/tmapas.cpp.o  obj/x86_64/release/cuGNG3D.cu.o      -L../../lib -L../../common/lib/ -L../../shared/lib -lcutil_x86_64 -lshrutil
_x86_64


Many thanks in advance, Sergio
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Fri Apr 06, 2012 8:15 am    Post subject: Reply with quote

Hi Sergio,

I don't see anything wrong here but without a complete example it's hard to tell what would be wrong. Can you either post or send the compete code to PGI Customer Support (trs@pgroup.com)? Ask them to forward it to me and I'll take a look.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group