Hi!
I am a little bit confused about a certain issue:
I am porting an unstructured grid application. In order to have coalesced memory access, I generated a new vector (Q_GPU_kc). It is ordered wit ...
I am poreting a large code to the GPU with the PGI Acc Model. Currently running 14x...
Now I want to do some fine tuning. I have 5 Loops that are showing not that good performance yet. The Loop ...