PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

loop carried dependence

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
ink



Joined: 25 Nov 2008
Posts: 8

PostPosted: Fri Sep 11, 2009 7:40 am    Post subject: loop carried dependence Reply with quote

i fail to understand where is "Complex loop carried dependence" in the code below. if leave only one of the arrays everything is fine but not if both are present.
===================================
#include <stdlib.h>
#define COUNT 16384

int main(int argc, char **argv){
float *d;
float *d2;

d = (float *)malloc( COUNT * sizeof(float));
d2 = (float *)malloc( COUNT * sizeof(float));

#pragma acc region for
for(int i = 0; i < COUNT; i++) {
float sum = 0.0;
float sum2 = 0.0;
d[i] = sum;
d2[i] = sum2;
}
}
===================================
pgcc -ta=nvidia -Minfo=accel -o t.exe t.c
main:
11, No parallel kernels found, accelerator region ignored
12, Complex loop carried dependence of d prevents parallelization
Loop carried dependence of d2 prevents parallelization
Loop carried backward dependence of d2 prevents vectorization
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6072
Location: The Portland Group Inc.

PostPosted: Fri Sep 11, 2009 10:45 am    Post subject: Reply with quote

Hi ink,

You need to compile with "-Msafeptr" to assert that your pointers don't overlap.

Code:
% pgcc -ta=nvidia -c test.c -Minfo=accel -V9.0-3
main:
     11, No parallel kernels found, accelerator region ignored
     12, Complex loop carried dependence of d prevents parallelization
         Loop carried dependence of d2 prevents parallelization
         Loop carried backward dependence of d2 prevents vectorization

% pgcc -ta=nvidia,time -o test.out test.c -Minfo=accel -Msafeptr -V9.0-3
main:
     11, Generating copyout(d[0:16383])
         Generating copyout(d2[0:16383])
     12, Loop is parallelizable
         Accelerator kernel generated
         12, #pragma for parallel, vector(256)
% test.out
Accelerator Kernel Timing data
test.c
  main
    11: region entered 1 time
        time(us): total=3895725 init=3895154 region=571
                  kernels=26 data=545
        w/o init: total=571 max=571 min=571 avg=571
        12: kernel launched 1 times
            grid: [64]  block: [256]
            time(us): total=26 max=26 min=26 avg=26


Hope this helps,
Mat
Back to top
View user's profile
ink



Joined: 25 Nov 2008
Posts: 8

PostPosted: Tue Sep 15, 2009 1:54 am    Post subject: Reply with quote

Hi Mat,
thanks for your help. it worked indeed.
all i need now is support for derived types and unsigned int.
thanks again
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group