|
| View previous topic :: View next topic |
| Author |
Message |
ghandurah
Joined: 03 Nov 2009 Posts: 9
|
Posted: Mon Dec 21, 2009 1:12 am Post subject: PGI 10.0 on Windows XP (Accelerator) |
|
|
Hi,
I've been experimenting with the trial version of PGI 10.0 for Windows (the Accelerator in specific), and I am getting weird responses!
| Code: |
#include <stdio.h>
#include<math.h>
#include <stdlib.h>
#include<time.h>
int main(){
printf("trial3dsubsetDiffSizes\n");
int a[100][99][89];
for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
a[i][j][k]=i+j+k;
for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
printf("%d\n",a[i][j][k]);
#pragma acc region
{
for (int k=5;k<60;k++)
for (int j=3;j<70;j++)
for (int i=50;i<99;i++)
a[i][j][k]*=5;
}
for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
printf("%d\n",a[i][j][k]);
printf("finished\n");
return 0;
}
|
It compiles but I get no output at all:
| Quote: |
PGI$ pgcc -ta=nvidia,time,keepgpu -Minfo=all,accel trial3DsubsetDiffSizes.c
NOTE: your trial license will expire in 7 days, 13.1 hours.
main:
26, Generating copy(a[50:98][3:69][5:59])
28, Loop is parallelizable
Accelerator kernel generated
28, #pragma acc for parallel, vector(55)
29, Loop is parallelizable
30, Loop is parallelizable
PGI$ trial3DsubsetDiffSizes.exe
PGI$
|
on the other hand, a similar program when compiled with -ta=nvidia,time -Minfo=accel doesn't print any info, but works correctly, and doesn't print timing info as well.
The main idea of my program is similar to the code above, I need to accelerate a 3-level-deep loop around a 3D array, or a 1D array using macros to calculate the 3D index, what's the best way to do it using the accelerator?
note: the actual program is using dynamically allocated arrays, not statically allocated like in this example
Thanks |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Dec 21, 2009 9:01 am Post subject: |
|
|
Hi ghandurah,
Since I'm out of the office for the next two weeks and the Windows requires a user be logged into the console in order to run on a GPU, I'm not able to recreate your error. Though, the code does seem to run correctly on Linux.
Does the code print the values if you compile without "-ta"? How about with just "-ta=nvidia"?
As for the dynamic arrays, you may need to use "-Msafeptr" or add the C99 restrict keyword to each of your pointers. Without this, the compiler must presume that your pointers could overlap and cannot generate accelerator code.
- Mat |
|
| Back to top |
|
 |
ghandurah
Joined: 03 Nov 2009 Posts: 9
|
Posted: Tue Dec 22, 2009 1:56 am Post subject: |
|
|
| Quote: |
Does the code print the values if you compile without "-ta"? How about with just "-ta=nvidia"?
|
Nothing at all. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Dec 23, 2009 9:51 am Post subject: |
|
|
Hi ghandurah,
It looks like your program is seg faulting because 'a' is too large. Try reducing the size of a to less than 250,000 elements.
- Mat |
|
| Back to top |
|
 |
ghandurah
Joined: 03 Nov 2009 Posts: 9
|
Posted: Wed Dec 23, 2009 11:47 pm Post subject: |
|
|
Thanks a lot mkcolg, I declared it as a dynamic array and it worked correctly.
Sorry but I have two more questions,
I need to compute some values then store them as constants because they'll be used to declare arrays (they represent the dimensions) and as array subscripts in other places, I declared them before the main function as follows:
| Code: |
const int X=20;
const int Y=10;
const int I=2.5*X+2*Y;
//J same way
//K same way
#define A(i,j,k) A[(i)*((J+2)*(K+2))+(j)*(K+2)+(k)]
// other arrays same way
|
it gives me this error:
PGC-S-0074-Non-constant expression in initializer ==> pointing to const int I=2.5*X+2*Y;
So I modified it to:
| Code: |
const int X=20;
const int Y=10;
int Ii=2.5*X+2*Y;
const int I=Ii;
//J same way
//K same way
#define A(i,j,k) A[(i)*((J+2)*(K+2))+(j)*(K+2)+(k)]
// other arrays same way
|
still the same error,
I used #define:
| Code: |
#define X 20
#define Y 10
#define I (2.5*X+2*Y)
//J same way
//K same way
#define X1 (X+1)
#define X2 (X+2)
#define A(i,j,k) B[(i)*((J+2)*(K+2))+(j)*(K+2)+(k)]
// other arrays same way
|
I got this:
| Quote: |
PGC-W-0046-Non-integral array subscript is cast to int (Acc: 1158)
PGC-W-0046-Non-integral array subscript is cast to int (Acc: 1162)
main:
461, Accelerator region ignored
464, Accelerator restriction: size of the GPU copy of an array depends on values computed in this loop
465, Accelerator restriction: size of the GPU copy of 'A' is unknown
Accelerator restriction: size of the GPU copy of 'B' is unknown
Accelerator restriction: one or more arrays have unknown size
|
the rest of code after definitions
| Code: |
inline void init3Darray (float *arr,int a, int b, int c, float val){
int index;
for (int i=1; i<a;i++)
for (int j=1; j<b;j++)
for (int k=1; k<c;k++){
index=i*b*c+j*c+k;
arr[index]=val;
}
}
int main(){
float *A=(float *)malloc((I+1)*(J+2)*(K+2)* sizeof (*A) ); init3Darray(A,I+1,J+2,K+2,0.0);
//B same way
#pragma acc region
{
//all those constants (P1, KP, etc) are declared the same way as I , J , K
for ( int k=P1; k<=KP;k++)
for ( int j=P1; j<=JP;j++)
for ( int i=2;i<=I;i++)
A(i,j,k)=B(i,j,k)*A(i,j,k);
}
return 0;
}
|
Is it better to use "acc region" or "acc for" for loops like the one above? any recommendations? given that the arrays will actually be much larger in size.
Second question:
I need to use cutil_inline, when I use it like this:
#include <cutil_inline.h>
the compiler doesn't recognize it, so I copied the cutil_inline.h file to the same directory and used #include "cutil_inline.h", it did recognize it but didn't recognize the other libraries referenced in it.
Thanks |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|