Now the code is almost perfectly running. The problem was that nested loops were not rectangular. This yielded a strange behavior with the loops extrema set to zero and loops never exe ...
We currently max at 7 loop levels (though are in the process of expanding this), but since you're only at 5-6 levels, this shouldn't matter. Something else is going on.
I am currently working with #pragma directives on CUDA accelerator. It works rather smoothly but in these days I have got a curious behavior. The code has at least 5-6 levels of nested loops but the c ...