PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

gang and worker

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
uestc0626



Joined: 14 May 2012
Posts: 6

PostPosted: Tue Apr 23, 2013 8:41 pm    Post subject: gang and worker Reply with quote

In order to test gang and worker clause, these is a small program.
Code:
#include<stdio.h>
#include<stdlib.h>
#define N 1000
#define M 1000

int  main()
{
   int *A;

   A=(int *)malloc(N*M*sizeof(int));
   
   for(int i=0;i<N*M;i++){
         A[i]=-1;
   }

   #pragma acc kernels loop gang(100),worker(128)
   for(int i=0;i<N*M;i++)
   {
      A[i]=i;   
   }
      
   for(int i=0;i<10;i++)
      printf("A=%d\n",A[i]);
   return 0;
}

Under the linux os , compile information :
[wcj@localhost example]$ pgcc -acc -Minfo gang.c
NOTE: your trial license will expire in 8 days, 13.7 hours.
main:
12, Memory set idiom, loop replaced by call to __c_mset4
17, Generating present_or_copyout(A[0:1000000])
Generating NVIDIA code
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
Generating compute capability 3.0 binary
18, Loop is parallelizable
Accelerator kernel generated
18, #pragma acc loop gang(100), vector(128) /* blockIdx.x threadIdx.x */
And the execution information:
[wcj@localhost example]$ ./a.out
A=0
A=1
A=2
A=3
A=4
A=5
A=6
A=7
A=8
A=9

Accelerator Kernel Timing data
/home/wcj/Yunio/openacc/example/gang.c
main NVIDIA devicenum=0
time(us): 703
18: kernel launched 1 times
grid: [7813] block: [128]
device time(us): total=80 max=80 min=80 avg=80
elapsed time(us): total=96 max=96 min=96 avg=96
27: data copyout reached 1 times
device time(us): total=623 max=623 min=623 avg=623
My question is that why grid number is not equal to the number that set the value in gang clause.
And I test the program under window OS using the PGI workstation. From the execution information, I know grid number is equal to the number that set the value in gang clause.
why the result of grid number are not the same under different system. Maybe it is the bug of the linux version of PGI compiler?

[/list][/code][/quote]
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Wed Apr 24, 2013 1:29 am    Post subject: Reply with quote

Quote:
Maybe it is the bug of the linux version of PGI compiler?
Yes, it's a known compiler issue (TPR#19149) that's expected to be fixed in next month's (May 2013) 13.5 release.

Note that you should be using "vector" instead of "worker" since "worker" corresponds to the warp size which is fixed on NVIDIA GPUs.

- Mat
Back to top
View user's profile
uestc0626



Joined: 14 May 2012
Posts: 6

PostPosted: Sun May 05, 2013 7:02 pm    Post subject: Reply with quote

mkcolg wrote:
Quote:
Maybe it is the bug of the linux version of PGI compiler?
Yes, it's a known compiler issue (TPR#19149) that's expected to be fixed in next month's (May 2013) 13.5 release.

Note that you should be using "vector" instead of "worker" since "worker" corresponds to the warp size which is fixed on NVIDIA GPUs.

- Mat


thanks ,Mat
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6125
Location: The Portland Group Inc.

PostPosted: Tue May 07, 2013 10:50 am    Post subject: Reply with quote

FYI, I've confirmed that 13.5 will give you the correct gang size:

Code:
% pgcc -acc -Minfo=accel -V13.5 uestc0626.c
main:
     16, Generating present_or_copyout(A[0:1000000])
         Generating NVIDIA code
         Generating compute capability 1.0 binary
         Generating compute capability 2.0 binary
         Generating compute capability 3.0 binary
     17, Loop is parallelizable
         Accelerator kernel generated
         17, #pragma acc loop gang(100), vector(128) /* blockIdx.x threadIdx.x */
% setenv PGI_ACC_TIME 1
% a.out
A=0
A=1
A=2
A=3
A=4
A=5
A=6
A=7
A=8
A=9

Accelerator Kernel Timing data
uestc0626.c
  main  NVIDIA  devicenum=0
        time(us): 381
        17: kernel launched 1 times
            grid: [100]  block: [128]
             device time(us): total=53 max=53 min=53 avg=53
            elapsed time(us): total=68 max=68 min=68 avg=68
        22: data copyout reached 1 times
             device time(us): total=328 max=328 min=328 avg=328
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group