PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

present clause after copyin clause

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Minh



Joined: 01 Mar 2013
Posts: 9

PostPosted: Wed Jun 05, 2013 12:37 am    Post subject: present clause after copyin clause Reply with quote

Hello,
I have three questions:
1. present clause after copyin clause:

I have tried your OpenACC example from this article http://www.pgroup.com/lit/articles/insider/v5n1a1.htm with a little modification. I used present clause in the callee function because in main function, we aready copied data into global memory.

In main function:
Code:

     #pragma acc data copyin(x[0:n]) copy(y[0:n])
      {
      gettimeofday( &t0, NULL );
      saxpy( n, a, x, y );
      gettimeofday( &t1, NULL );
      }


Original callee g code:
Code:

       void saxpy( int n, float a, float x[], float y[] ){
          #pragma acc parallel loop pcopyin(x[0:n],a,n) pcopy(y[0:n])
           for( int i = 0; i < n; ++i )
               y[i] += a*x[i];
           #pragma acc wait
       }


Edited code with present clause instead of pcopyin and pcopy:
Code:


      void saxpy( int n, float a, float x[], float y[] ){
           #pragma acc parallel loop present(x[0:n],a,n) present(y[0:n])
           for( int i = 0; i < n; ++i )
               y[i] += a*x[i];
           #pragma acc wait
       }



The program was compiled successfully. But when I executed the program:

Code:
./saxpy_test_data
FATAL ERROR: data in PRESENT clause was not found on device 1: name=y
 file: <PATH>/openacc_pgc++/saxpy.cpp _Z5saxpyifPfS_ line:1


I think we already copied y into device memory in main function in main function? But, y was not available in device memory at runtime?

2. How to deallocate memory which is allocated by copy/create clause?

3. What is pgi_uacc_cuda_fill ? I have this information when I profiled my application.

Thank you very much,
Regards,
Minh
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6119
Location: The Portland Group Inc.

PostPosted: Wed Jun 05, 2013 10:20 am    Post subject: Reply with quote

Hi Minh,

Quote:
I think we already copied y into device memory in main function in main function? But, y was not available in device memory at runtime?

You need to do a few more things. First, expand the outer data region so it's encompasses all calls to saxpy (you're missing the first init call), add an "update" directive to update "y"'s value, remove "a" and "n" from the present clause since they aren't present on the device, and finally, update the Makefile so that "pgc++ -acc" is used to compile the main program.

Code:
#pragma acc data copyin(x[0:n]) copy(y[0:n])
{
    saxpy( n, a, x, y );        // first time to initialize the system
    for( int i = 0; i < n; ++i ) y[i] = i*100;

#pragma acc update device(y[0:n])

    // now for real, time it
    gettimeofday( &t0, NULL );
    saxpy( n, a, x, y );
    gettimeofday( &t1, NULL );
}


Quote:
2. How to deallocate memory which is allocated by copy/create clause?
The device memory allocation and deallocation is handled automatically when a data or compute region is entered or exited.

Quote:
3. What is pgi_uacc_cuda_fill ? I have this information when I profiled my application.
It's a PGI run time routine that performs an optimized device to device data copy.

- Mat
Back to top
View user's profile
Minh



Joined: 01 Mar 2013
Posts: 9

PostPosted: Thu Jun 06, 2013 3:03 am    Post subject: Reply with quote

Hi Mat,
Thanks a lot for your detail answer. I am still confused.

1. We use "a" and "n" in this saxpy kernel. So, when we remove them from data copy clause, how could GPUs can compute the kernel?

Code:
  void saxpy( int n, float a, float x[], float y[] ){
           #pragma acc parallel loop present(x[0:n]) present(y[0:n])
           for( int i = 0; i < n; ++i )
               y[i] += a*x[i];
           #pragma acc wait
       }


2. As I understand, the data in device memory is no longer accessible from outside of data/compute region. So, pcopy is only valid for variable which is allocated via acc_malloc?
Quote:
The device memory allocation and deallocation is handled automatically when a data or compute region is entered or exited.


3.
Quote:
It's a PGI run time routine that performs an optimized device to device data copy.


So, this involves two GPUs? Could you explain more about this?

Thank you,
Regards,
Minh
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group