| View previous topic :: View next topic |
| Author |
Message |
uestc0626
Joined: 14 May 2012 Posts: 2
|
Posted: Tue Apr 23, 2013 8:41 pm Post subject: gang and worker |
|
|
In order to test gang and worker clause, these is a small program.
| Code: | #include<stdio.h>
#include<stdlib.h>
#define N 1000
#define M 1000
int main()
{
int *A;
A=(int *)malloc(N*M*sizeof(int));
for(int i=0;i<N*M;i++){
A[i]=-1;
}
#pragma acc kernels loop gang(100),worker(128)
for(int i=0;i<N*M;i++)
{
A[i]=i;
}
for(int i=0;i<10;i++)
printf("A=%d\n",A[i]);
return 0;
}
|
Under the linux os , compile information :
[wcj@localhost example]$ pgcc -acc -Minfo gang.c
NOTE: your trial license will expire in 8 days, 13.7 hours.
main:
12, Memory set idiom, loop replaced by call to __c_mset4
17, Generating present_or_copyout(A[0:1000000])
Generating NVIDIA code
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
Generating compute capability 3.0 binary
18, Loop is parallelizable
Accelerator kernel generated
18, #pragma acc loop gang(100), vector(128) /* blockIdx.x threadIdx.x */
And the execution information:
[wcj@localhost example]$ ./a.out
A=0
A=1
A=2
A=3
A=4
A=5
A=6
A=7
A=8
A=9
Accelerator Kernel Timing data
/home/wcj/Yunio/openacc/example/gang.c
main NVIDIA devicenum=0
time(us): 703
18: kernel launched 1 times
grid: [7813] block: [128]
device time(us): total=80 max=80 min=80 avg=80
elapsed time(us): total=96 max=96 min=96 avg=96
27: data copyout reached 1 times
device time(us): total=623 max=623 min=623 avg=623
My question is that why grid number is not equal to the number that set the value in gang clause.
And I test the program under window OS using the PGI workstation. From the execution information, I know grid number is equal to the number that set the value in gang clause.
why the result of grid number are not the same under different system. Maybe it is the bug of the linux version of PGI compiler?
[/list][/code][/quote] |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5001 Location: The Portland Group Inc.
|
Posted: Wed Apr 24, 2013 1:29 am Post subject: |
|
|
| Quote: | | Maybe it is the bug of the linux version of PGI compiler? | Yes, it's a known compiler issue (TPR#19149) that's expected to be fixed in next month's (May 2013) 13.5 release.
Note that you should be using "vector" instead of "worker" since "worker" corresponds to the warp size which is fixed on NVIDIA GPUs.
- Mat |
|
| Back to top |
|
 |
uestc0626
Joined: 14 May 2012 Posts: 2
|
Posted: Sun May 05, 2013 7:02 pm Post subject: |
|
|
| mkcolg wrote: | | Quote: | | Maybe it is the bug of the linux version of PGI compiler? | Yes, it's a known compiler issue (TPR#19149) that's expected to be fixed in next month's (May 2013) 13.5 release.
Note that you should be using "vector" instead of "worker" since "worker" corresponds to the warp size which is fixed on NVIDIA GPUs.
- Mat |
thanks ,Mat |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 5001 Location: The Portland Group Inc.
|
Posted: Tue May 07, 2013 10:50 am Post subject: |
|
|
FYI, I've confirmed that 13.5 will give you the correct gang size:
| Code: | % pgcc -acc -Minfo=accel -V13.5 uestc0626.c
main:
16, Generating present_or_copyout(A[0:1000000])
Generating NVIDIA code
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
Generating compute capability 3.0 binary
17, Loop is parallelizable
Accelerator kernel generated
17, #pragma acc loop gang(100), vector(128) /* blockIdx.x threadIdx.x */
% setenv PGI_ACC_TIME 1
% a.out
A=0
A=1
A=2
A=3
A=4
A=5
A=6
A=7
A=8
A=9
Accelerator Kernel Timing data
uestc0626.c
main NVIDIA devicenum=0
time(us): 381
17: kernel launched 1 times
grid: [100] block: [128]
device time(us): total=53 max=53 min=53 avg=53
elapsed time(us): total=68 max=68 min=68 avg=68
22: data copyout reached 1 times
device time(us): total=328 max=328 min=328 avg=328
|
|
|
| Back to top |
|
 |
|