| View previous topic :: View next topic |
| Author |
Message |
AdamSimpson
Joined: 08 Feb 2011 Posts: 9
|
Posted: Mon Mar 18, 2013 5:17 am Post subject: OpenACC nested c structs |
|
|
Hi,
I have run into a problem with OpenACC with pgi/12.10.0, it does not seem to allow c structs which contain more than one struct member. For example the following code will not parallelize, giving an invalid loop error in the -Minfo output. If I however only have one double3 struct inside the st struct it seems to parallelize. Is this something that has been fixed in a newer version?
Thanks,
Adam
| Code: |
typedef struct ST st;
typedef struct TUPLE double3;
#include <stdio.h>
#include <stdlib.h>
struct TUPLE {
double x;
double y;
double z;
};
struct ST {
double3 pos;
double3 vel;
};
int main(int argc, char *argv[])
{
st *mystructs = (st*) malloc(sizeof(st)*1000);
int i;
st tmpST;
#pragma acc kernels copy(mystructs[0:1000])
for(i=0; i<1000; i++){
tmpST = mystructs[i];
}
return 0;
}
|
|
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Mar 18, 2013 9:59 am Post subject: |
|
|
Hi Adam,
The problem is with the implicit deep copy between the two structs. If you change this to an explicit copy, then your should be fine.
| Code: | % cat struct.c
typedef struct ST st;
typedef struct TUPLE double3;
#include <stdio.h>
#include <stdlib.h>
struct TUPLE {
double x;
double y;
double z;
};
struct ST {
double3 pos;
double3 vel;
};
int main(int argc, char *argv[])
{
st *mystructs = (st*) malloc(sizeof(st)*1000);
int i;
st tmpST;
#pragma acc kernels copy(mystructs[0:1000])
for(i=0; i<1000; i++){
tmpST.pos.x = mystructs[i].pos.x;
tmpST.pos.y = mystructs[i].pos.y;
tmpST.pos.z = mystructs[i].pos.z;
tmpST.vel.x = mystructs[i].vel.x;
tmpST.vel.y = mystructs[i].vel.y;
tmpST.vel.z = mystructs[i].vel.z;
}
return 0;
}
% pgcc -acc -Minfo=accel struct.c
main:
24, Generating present_or_copy(mystructs[0:1000])
Generating NVIDIA code
Generating compute capability 1.0 binary
Generating compute capability 2.0 binary
Generating compute capability 3.0 binary
25, Loop is parallelizable
Accelerator kernel generated
25, #pragma acc loop gang, vector(128) /* blockIdx.x threadIdx.x */
|
Hope this helps,
Mat |
|
| Back to top |
|
 |
AdamSimpson
Joined: 08 Feb 2011 Posts: 9
|
Posted: Mon Mar 18, 2013 12:07 pm Post subject: |
|
|
| Thanks Mat, that works. |
|
| Back to top |
|
 |
AdamSimpson
Joined: 08 Feb 2011 Posts: 9
|
Posted: Tue Mar 19, 2013 5:49 am Post subject: |
|
|
Mat,
Just so i'm clear, is this considered to be a bug that may be fixed in the future? |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Mar 20, 2013 9:29 am Post subject: |
|
|
No, it's more of a limitation. Though, I put in a feature request (TPR#19218) and we'll see what can be done. Your case is simpler then others I've seen so may be possible.
- Mat |
|
| Back to top |
|
 |
|