| View previous topic :: View next topic |
| Author |
Message |
luxuia
Joined: 01 Apr 2013 Posts: 4
|
Posted: Wed Apr 17, 2013 7:45 pm Post subject: the subarray usage in copy |
|
|
I try to use like | Code: | | copyin(arr[0:total/2], arr[total/2:total/2]) | instead of | Code: | | copyin(arr[0:total]) | as i have big data and have to split data.
but the ans use second is right, first is wrong.
I dont know what happened.
thanks ahead |
|
| Back to top |
|
 |
luxuia
Joined: 01 Apr 2013 Posts: 4
|
Posted: Wed Apr 17, 2013 7:58 pm Post subject: |
|
|
I do a sample test like is:
| Code: | int arr[100];
for (int i = 0; i < 100; ++i) {
arr[i] = i*2;
}
int ans[100];
#pragma acc kernels copyin(arr[0:50], arr[50:50]), copyout(ans[100])
for (int i = 0; i < 100; ++i) {
ans[i] = arr[i];
}
for (int i = 0; i < 100; ++i) {
fprintf(stderr, "%d\t", ans[i]);
}
|
and find arr[0:50] do not actually copyed into device memory.
what should i do to copyin two subset of an array.
thanks. |
|
| Back to top |
|
 |
luxuia
Joined: 01 Apr 2013 Posts: 4
|
Posted: Wed Apr 17, 2013 8:37 pm Post subject: |
|
|
| Code: | | copyin(temp2) followed by copy[in](temp2); copyin attribute ignored |
as the msg says,
what can i do..:(
deeply said.-.- |
|
| Back to top |
|
 |
luxuia
Joined: 01 Apr 2013 Posts: 4
|
Posted: Wed Apr 17, 2013 8:53 pm Post subject: |
|
|
I do not think i find a solution, but it is very ugly...
so i can use _arr instead of arr when warning occur....
but it is very slowly..
I was afraid of seeing the `time` command ...
any other way to do????[/code] |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Apr 18, 2013 9:22 am Post subject: |
|
|
Hi luxuia,
No, you can't put the same variable more than once in a data copy clause. There has to be a single base host address from which to map the host and device data.
Though, I question why you need to break apart the array this way. Can you please clarify what you mean by "as i have big data" and why you believe it needs to be split?
If you don't have enough memory on the device to hold the entire array, breaking it apart in the data clause wont help. In this case, you need to strip mine the loop into a block size that fits on the device. Something like:
| Code: |
% cat test_block.c
#include <stdio.h>
int main () {
const int size = 100;
int arr[size];
int ans[size];
int block_size = 50;
for (int i = 0; i < size; ++i) {
arr[i] = i*2;
}
for (int b=0; b < size; b+=block_size) {
#pragma acc kernels copyin(arr[b:block_size]), copyout(ans[b:block_size])
for (int i = b; i < b+block_size; ++i) {
ans[i] = arr[i];
}
}
for (int i = 0; i < size; ++i) {
printf("%d: %d\n", i, ans[i]);
}
exit(0);
}
sb-colgrove:/local/home/colgrove% pgcc test_block.c -acc -Minfo=accel
main:
12, Generating copyout(ans[b:block_size])
Generating copyin(arr[b:block_size])
13, Generating NVIDIA code
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
Generating compute capability 3.0 binary
14, Loop is parallelizable
Accelerator kernel generated
14, #pragma acc loop gang, vector(64) /* blockIdx.x threadIdx.x */
% setenv PGI_ACC_NOTIFY 1
% a.out
launch CUDA kernel file=../test_block.c function=main line=14 device=0 grid=1 block=64
launch CUDA kernel file=../test_block.c function=main line=14 device=0 grid=1 block=64
0: 0
1: 2
2: 4
3: 6
4: 8
5: 10
6: 12
7: 14
8: 16
....
94: 188
95: 190
96: 192
97: 194
98: 196
99: 198
|
Hope this helps,
Mat |
|
| Back to top |
|
 |
|