PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

compiler output

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
THX 1138



Joined: 30 Jun 2011
Posts: 102

PostPosted: Wed Oct 03, 2012 1:34 pm    Post subject: compiler output Reply with quote

167, Generating compute capability 1.3 binary
Generating compute capability 2.0 binary
168, Accelerator restriction: size of the GPU copy of 'PWRSUM' is unknown
Accelerator restriction: size of the GPU copy of 'PHSSUM' is unknown
Complex loop carried dependence of '*(PWRSUM)' prevents parallelization
Loop carried dependence of '*(PHSSUM)' prevents parallelization
Loop carried backward dependence of '*(PHSSUM)' prevents vectorization
Conditional loop will be executed in scalar mode
Accelerator kernel generated
168, CC 1.3 : 6 registers; 40 shared, 0 constant, 0 local memory bytes
CC 2.0 : 6 registers; 0 shared, 56 constant, 0 local memory bytes
Generating copyout(PWRSUM[0:])
Generating copyout(PHSSUM[0:])
178, Accelerator region ignored
179, Accelerator restriction: invalid loop
203, Accelerator restriction: size of the GPU copy of 'Env' is unknown
Accelerator restriction: invalid loop
204, Accelerator restriction: datatype not supported: SCST
238, Accelerator restriction: size of the GPU copy of 'URAYu' is unknown
Accelerator restriction: size of the GPU copy of 'FRAYu' is unknown
288, Accelerator restriction: size of the GPU copy of 'PWRSUM' is unknown
291, Accelerator restriction: size of the GPU copy of 'PHSSUM' is unknown
311, Accelerator restriction: size of the GPU copy of 'PHSSUM' is unknown

In the compiler output shown above, I am not sure what it is saying to me. For instance in line 178 it says "accelerator region ignored", and line 179 " loop is invalid". Why? This code compiles. So how could it be an invalid loop? Why was accel region ignored in line 178.?

In the earlier comment, line 168, it talks about preventing parallelization. It still generates an accelerator kernel?

Thanks in advance.

THX 1138
Back to top
View user's profile
THX 1138



Joined: 30 Jun 2011
Posts: 102

PostPosted: Thu Oct 04, 2012 8:48 am    Post subject: compiler output Reply with quote

Okay, I think I see the answer to the first question. The compiler generates a sequential kernel in this case not a parallel kernel. I get it. Maybe it should say the type of kernel instead of just saying kernel generated.

I still need help on the other question, however.

Thanks in advance.

Newport_j
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 5815
Location: The Portland Group Inc.

PostPosted: Thu Oct 04, 2012 1:04 pm    Post subject: Reply with quote

Hi Newport_j,

Quote:
Accelerator restriction: size of the GPU copy of 'PHSSUM' is unknown
This one is hopefully self explanatory. The compiler can't tell how big the array is so yo need to use one of the copy clauses to explicitly indicate how much to copy over to the GPU.

Quote:
Complex loop carried dependence of '*(PWRSUM)' prevents parallelization
Loop carried dependence of '*(PHSSUM)' prevents parallelization
If you have two pointers in C, it's possible that they point at the same memory or overlap in memory. Overlapping memory prevents parallelization. However, the compiler can't tell at compile time if the memory overlaps so must assume it does. The C99 "restrict" keyword should be used to assert that the pointers don't overlap, or you can use the flag "-Msafeptr" to declare that all pointers are independent.

Other possible reasons for this message are that you're using a calculated index, i.e. "idx = getIndex(i,j); PWRSUM(idx) = ... ;". In these cases, the compiler must assume that all values of the computed index are the same, and hence the code is not parallel. In this case you need to add the "#pragma acc loop independent" directive to assert to the compiler the code is parallel. Note that is is only required for the "kernels" model. The "parallel" model assumes loops are independent.

Another reason could be that there really is a loop carried dependency, in which case, you'll need to change your code. Though, I'd need a code snip-it to tell.

Quote:
For instance in line 178 it says "accelerator region ignored", and line 179 " loop is invalid". Why?
I'd need a code snipit to tell. But things like "while" loops, or an implicit shallow copy of a struct would get this message.
Quote:

It still generates an accelerator kernel?
If you're using the "parallel" model, a kernel will be generated.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group