PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

OpenMP, OpenACC and acc_set_device_num
Goto page Previous  1, 2, 3  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
Neldan



Joined: 12 Feb 2013
Posts: 11

PostPosted: Tue Mar 05, 2013 11:30 am    Post subject: Reply with quote

i just update to newest version of pgcc and now seems it works, but still i'm having a problem with the execution

During the execution the program print a "Invalid handle" error

my code is this:
Code:

        int sizeR = numRows1*numRows2;

        #pragma omp parallel num_threads(2) private(result)
        {
                int th= omp_get_thread_num();
#if _OPENACC
                acc_init(acc_device_nvidia);
                acc_set_device_num(th+1,acc_device_nvidia);
#endif
                fprintf(stdout,"THREAD(%d) - Launched thread.\n",th);
                fprintf(stdout,"THREAD(%d) - Selected device: %d\n",th,acc_get_device_num(acc_device_nvidia));
                int bI = th*(numRows1/2);
                int eI = numRows1/((!th)+1);
                fprintf(stdout,"THREAD(%d) - begin I: %d, end I: %d\n",th,bI,eI);
                int bR = th*(sizeR/2);
                int eR = (sizeR/((!th)+1));
                fprintf(stdout,"THREAD(%d) - size R: %d, begin R: %d, end R: %d\n",th,sizeR,bR,eR);
                result = &result[bR];

                #pragma acc kernels copyin(m1[0:numRows1*numColumns1],m2[0:numRows2*numColumns2]), copyout(result[0:eR-bR])
                {
                        int i = bI;
                        #pragma acc loop gang vector(256), independent
                        for (i=0;i<eI;i++)
                        {
                                int j;
                                #pragma acc loop gang vector(2) independent
                                for(j=0;j<numRows2;j++)
                                {
                                        real_t acum = 0;
                                        int k;
                                        for(k=0;k<numColumns1;k++) {
                                                acum += m1[i+k*numColumns1] * m2[j*numColumns2+k];
                                        }
                                        result[(i-bI)*numRows1+j] = acum;
                                }
                        }
                }
        }


I use a matriz size 5000x5000

and the output is this:

Quote:
THREAD(0) - Launched thread.
THREAD(0) - Selected device: 1
THREAD(0) - begin I: 0, end I: 50
THREAD(0) - size R: 10000, begin R: 0, end R: 5000
THREAD(1) - Launched thread.
THREAD(1) - Selected device: 2
THREAD(1) - begin I: 50, end I: 100
THREAD(1) - size R: 10000, begin R: 5000, end R: 10000
call to cuLaunchKernel returned error 400: Invalid handle
call to cuMemFree returned error 700: Launch failed
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 4996
Location: The Portland Group Inc.

PostPosted: Tue Mar 05, 2013 12:09 pm    Post subject: Reply with quote

Hi Neldan,

Unfortunately, all this tells me is that the kernel failed for some reason. To narrow down the issued, can you try running with a single OpenMP thread? Also, try removing the schedule clauses, i.e the gang and vector and let the compiler schedule the loop.

- Mat
Back to top
View user's profile
Neldan



Joined: 12 Feb 2013
Posts: 11

PostPosted: Tue Mar 05, 2013 12:22 pm    Post subject: Reply with quote

mkcolg wrote:
Hi Neldan,

Unfortunately, all this tells me is that the kernel failed for some reason. To narrow down the issued, can you try running with a single OpenMP thread? Also, try removing the schedule clauses, i.e the gang and vector and let the compiler schedule the loop.

- Mat


With a single openmp thread the kernel works fine
Back to top
View user's profile
Neldan



Joined: 12 Feb 2013
Posts: 11

PostPosted: Wed Mar 06, 2013 10:38 am    Post subject: Reply with quote

i have been doing some test using 'fork' instead of openMP, and works fine. So i think that the problem is on the kernel's call from openMP
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 4996
Location: The Portland Group Inc.

PostPosted: Wed Mar 06, 2013 12:01 pm    Post subject: Reply with quote

Quote:
i have been doing some test using 'fork' instead of openMP, and works fine. So i think that the problem is on the kernel's call from openMP
Ok. Can you you send a reproducible example to PGI Customer service (trs@pgroup.com) so we can determine the issue?

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page Previous  1, 2, 3  Next
Page 2 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2002 phpBB Group