PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

update directive - how to?

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
PaulPa



Joined: 02 Aug 2012
Posts: 35

PostPosted: Mon Nov 26, 2012 1:37 pm    Post subject: update directive - how to? Reply with quote

Hi Mat,

I'm having some problems with the update directive within a data region.

Here is a little sample code:
Code:

   int n = 1024;
   FloatType* out = (double*) malloc(sizeof(double) *n);
   FloatType* in = (double*) malloc(sizeof(double) *n);

   int i;
   for(i = 0 ; i< 1024; ++i)
      in[i] = 1.0;
   
#pragma acc data create(in[0:n],out[0:n])
   {   
#pragma update device(in[0:n]) // copy data from host to device
#pragma acc kernels present(in[0:n],out[0:n])
      for(i = 0 ; i< 1024; ++i)
         out[i] = in[i] + 1.0;
#pragma update host(out[0:n]) // copy data from device to host
   }   
   printf("%f %f \n",out[0],out[1]);


Of course this code does not make any sense but it illustrates my use case.

PGI 12.9 generates the following feedback:
Code:

     31, Generating create(out[0:n])
         Generating create(in[0:n])
     34, Generating present(out[0:n])
         Generating present(in[0:n])
         Generating compute capability 2.0 binary
     35, Loop is parallelizable
         Accelerator kernel generated
         35, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
             CC 2.0 : 12 registers; 0 shared, 52 constant, 0 local memory bytes


I am expecting an output of "2.0 2.0" but I receive "0.0 0.0".

Thanks.

Best,
Paul
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6120
Location: The Portland Group Inc.

PostPosted: Mon Nov 26, 2012 3:12 pm    Post subject: Reply with quote

Hi Paul,

You forgot the "acc" before "update".
Code:

   int n = 1024;
   FloatType* out = (double*) malloc(sizeof(double) *n);
   FloatType* in = (double*) malloc(sizeof(double) *n);

   int i;
   for(i = 0 ; i< 1024; ++i)
      in[i] = 1.0;
   
#pragma acc data create(in[0:n],out[0:n])
   {   
#pragma acc update device(in[0:n]) // copy data from host to device
#pragma acc kernels present(in[0:n],out[0:n])
      for(i = 0 ; i< 1024; ++i)
         out[i] = in[i] + 1.0;
#pragma acc update host(out[0:n]) // copy data from device to host
   }   
   printf("%f %f \n",out[0],out[1]);


Hope this helps,
Mat
Back to top
View user's profile
PaulPa



Joined: 02 Aug 2012
Posts: 35

PostPosted: Tue Nov 27, 2012 12:34 am    Post subject: Reply with quote

indeed I missed the "acc", thank you.

Is it possible that the compiler issues a warning that the pragma is not recognized?

However, for my real application it is still not working since the compiler complains about this:
Code:

         Accelerator clause: upper bound for dimension 1 of array 'array_in' is unknown
         Accelerator clause: upper bound for dimension 0 of array 'array_in' is unknown
         Generating update device(array_in[0:3][0:num_entries])

         Accelerator clause: upper bound for dimension 1 of array 'array_out' is unknown
         Accelerator clause: upper bound for dimension 0 of array 'array_out' is unknown
         Generating update host(array_out[0:3][0:num_entries])


and it results in the following run-time error:
Code:
call to cuMemcpyDtoH returned error 1: Invalid value


My code looks something like the following:
Code:

 void some_function(my_struct_t* my_struct){
    float** array_in = my_struct->array_in;
    float** array_out = my_struct->array_out;
    #pragma acc update device(array_in[0:3][0:num_entries]
    //launch some kernel (similar to the example above)
    #pragma acc update host(array_out[0:3][0:num_entries]
}

 int main(){
    my_struct_t* my_struct;
    my_struct = my_struct_init(); //initializes and allocates the data
    float** array_in = my_struct->array_in;
    float** array_out = my_struct->array_out;

#pragma acc data create(array_out[0:3][0:num_entries], array_in[0:3][0:num_entries])
    some_function(my_struct);
 }


I guess that has something to do with the function call between the acc data create and the update directive but I'm not quite sure.
Is there a way to fix this?

Any help is much appriciated.

Best,
Paul
Back to top
View user's profile
PaulPa



Joined: 02 Aug 2012
Posts: 35

PostPosted: Thu Nov 29, 2012 12:47 am    Post subject: Reply with quote

Here is yet another related problem I'm facing:

Is there an OpenACC equivalent to cudaMemcpy?
There is acc_alloc(..) which let's me allocate data on the GPU and there is a deviceptr clause to pass this memory to the OpenACC region, but in order to solve the problem above I need a copy from device to host where I can specify the destination.

+Paul
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group