PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

PGI 10.0 on Windows XP (Accelerator)
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
ghandurah



Joined: 03 Nov 2009
Posts: 9

PostPosted: Mon Dec 21, 2009 1:12 am    Post subject: PGI 10.0 on Windows XP (Accelerator) Reply with quote

Hi,

I've been experimenting with the trial version of PGI 10.0 for Windows (the Accelerator in specific), and I am getting weird responses!

Code:

#include <stdio.h>
#include<math.h>
#include <stdlib.h>
#include<time.h>

int main(){

printf("trial3dsubsetDiffSizes\n");


int a[100][99][89];

for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
a[i][j][k]=i+j+k;



for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
printf("%d\n",a[i][j][k]);


#pragma acc region
{
for (int k=5;k<60;k++)
for (int j=3;j<70;j++)
for (int i=50;i<99;i++)
a[i][j][k]*=5;

}


for (int k=0;k<89;k++)
for (int j=0;j<99;j++)
for (int i=0;i<100;i++)
printf("%d\n",a[i][j][k]);

printf("finished\n");

return 0;
}



It compiles but I get no output at all:
Quote:

PGI$ pgcc -ta=nvidia,time,keepgpu -Minfo=all,accel trial3DsubsetDiffSizes.c
NOTE: your trial license will expire in 7 days, 13.1 hours.
main:
26, Generating copy(a[50:98][3:69][5:59])
28, Loop is parallelizable
Accelerator kernel generated
28, #pragma acc for parallel, vector(55)
29, Loop is parallelizable
30, Loop is parallelizable
PGI$ trial3DsubsetDiffSizes.exe
PGI$



on the other hand, a similar program when compiled with -ta=nvidia,time -Minfo=accel doesn't print any info, but works correctly, and doesn't print timing info as well.

The main idea of my program is similar to the code above, I need to accelerate a 3-level-deep loop around a 3D array, or a 1D array using macros to calculate the 3D index, what's the best way to do it using the accelerator?

note: the actual program is using dynamically allocated arrays, not statically allocated like in this example

Thanks
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6211
Location: The Portland Group Inc.

PostPosted: Mon Dec 21, 2009 9:01 am    Post subject: Reply with quote

Hi ghandurah,

Since I'm out of the office for the next two weeks and the Windows requires a user be logged into the console in order to run on a GPU, I'm not able to recreate your error. Though, the code does seem to run correctly on Linux.

Does the code print the values if you compile without "-ta"? How about with just "-ta=nvidia"?

As for the dynamic arrays, you may need to use "-Msafeptr" or add the C99 restrict keyword to each of your pointers. Without this, the compiler must presume that your pointers could overlap and cannot generate accelerator code.

- Mat
Back to top
View user's profile
ghandurah



Joined: 03 Nov 2009
Posts: 9

PostPosted: Tue Dec 22, 2009 1:56 am    Post subject: Reply with quote

Quote:

Does the code print the values if you compile without "-ta"? How about with just "-ta=nvidia"?


Nothing at all.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6211
Location: The Portland Group Inc.

PostPosted: Wed Dec 23, 2009 9:51 am    Post subject: Reply with quote

Hi ghandurah,

It looks like your program is seg faulting because 'a' is too large. Try reducing the size of a to less than 250,000 elements.

- Mat
Back to top
View user's profile
ghandurah



Joined: 03 Nov 2009
Posts: 9

PostPosted: Wed Dec 23, 2009 11:47 pm    Post subject: Reply with quote

Thanks a lot mkcolg, I declared it as a dynamic array and it worked correctly.

Sorry but I have two more questions,

I need to compute some values then store them as constants because they'll be used to declare arrays (they represent the dimensions) and as array subscripts in other places, I declared them before the main function as follows:

Code:

const int X=20;
const int Y=10;

const int I=2.5*X+2*Y;

//J same way
//K same way

#define A(i,j,k) A[(i)*((J+2)*(K+2))+(j)*(K+2)+(k)]
// other arrays same way


it gives me this error:
PGC-S-0074-Non-constant expression in initializer ==> pointing to const int I=2.5*X+2*Y;

So I modified it to:

Code:

const int X=20;
const int Y=10;

int Ii=2.5*X+2*Y;

const int I=Ii;

//J same way
//K same way

#define A(i,j,k) A[(i)*((J+2)*(K+2))+(j)*(K+2)+(k)]
// other arrays same way


still the same error,

I used #define:
Code:

#define X 20
#define  Y 10

#define  I (2.5*X+2*Y)
//J same way
//K same way   
   
#define  X1 (X+1)   
#define   X2 (X+2)

#define A(i,j,k) B[(i)*((J+2)*(K+2))+(j)*(K+2)+(k)]
// other arrays same way



I got this:
Quote:

PGC-W-0046-Non-integral array subscript is cast to int (Acc: 1158)
PGC-W-0046-Non-integral array subscript is cast to int (Acc: 1162)
main:
461, Accelerator region ignored
464, Accelerator restriction: size of the GPU copy of an array depends on values computed in this loop
465, Accelerator restriction: size of the GPU copy of 'A' is unknown
Accelerator restriction: size of the GPU copy of 'B' is unknown
Accelerator restriction: one or more arrays have unknown size


the rest of code after definitions
Code:

inline void init3Darray (float *arr,int a, int b, int c, float val){
   int index;
   for (int i=1; i<a;i++)
      for (int j=1; j<b;j++)
         for (int k=1; k<c;k++){
            index=i*b*c+j*c+k;
            arr[index]=val;
         }

}



int main(){



float *A=(float *)malloc((I+1)*(J+2)*(K+2)* sizeof (*A) );   init3Darray(A,I+1,J+2,K+2,0.0);

//B same way


#pragma acc region
{
//all those constants (P1, KP, etc) are declared the same way  as I , J , K   
for ( int k=P1; k<=KP;k++)
   for ( int j=P1; j<=JP;j++)
      for ( int  i=2;i<=I;i++)
               A(i,j,k)=B(i,j,k)*A(i,j,k);
}





return 0;
}


Is it better to use "acc region" or "acc for" for loops like the one above? any recommendations? given that the arrays will actually be much larger in size.

Second question:
I need to use cutil_inline, when I use it like this:
#include <cutil_inline.h>

the compiler doesn't recognize it, so I copied the cutil_inline.h file to the same directory and used #include "cutil_inline.h", it did recognize it but didn't recognize the other libraries referenced in it.

Thanks
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group