| View previous topic :: View next topic |
| Author |
Message |
deeppow
Joined: 02 Feb 2012 Posts: 51
|
Posted: Tue Mar 05, 2013 11:36 am Post subject: unrolling or data dependent loops |
|
|
I have a typical loop, for example
DO k = 1, JFLagcnt
j = LSTjflag(k)
w(j) = (w(j)+B(j)*p(j))/B(j)
ENDDO
Initially compiler gives "Loop not vectorized: data dependency Loop unrolled 4 times " so I try a compiler directive
!pgi$l nodepchk
DO k = 1, JFLagcnt
j = LSTjflag(k)
w(j) = (w(j)+B(j)*p(j))/B(j)
ENDDO
and get the following compiler output "Loop not parallelized: innermost Loop not vectorized: may not be beneficial Loop unrolled 4 times"
The loop is >12000 elements. What am I doing wrong so I get it to vectorized? There are a number of loops like this so it'll affect run time. |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Tue Mar 05, 2013 3:03 pm Post subject: |
|
|
Hi deeppow,
I think you're fine, but the compiler isn't tuned to vectorize loops where the indexes come from a look-up table. I added a feature request (TPR#19181) and we will have our engineers see what we can do.
Thanks,
Mat |
|
| Back to top |
|
 |
deeppow
Joined: 02 Feb 2012 Posts: 51
|
Posted: Tue Mar 05, 2013 3:11 pm Post subject: |
|
|
Matt,
Using indirect indexing to avoid repeated if-testing, do test once and store for reuse. It's an old method, is there a better way these days?
-ralph |
|
| Back to top |
|
 |
deeppow
Joined: 02 Feb 2012 Posts: 51
|
Posted: Tue Mar 05, 2013 3:33 pm Post subject: |
|
|
An additional weird problem is associated with
DO k = 1, JFLagcnt
j = LSTjflag(k)
u(j) = r(j) + beta*h(j)
p(j) = u(j) + beta*(beta*p(j)+h(j))
ENDDO
which produces the compiler output "Loop not vectorized: data dependency Loop unrolled 2 times". Most data dependency failures such as that noted above produce unrolling of 4 times. Even thought the default is 4, I tried to force it by using the compiler option "-Munroll=c:4" which as one might expect doesn't change the behavior.
-ralph |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Wed Mar 06, 2013 11:02 am Post subject: |
|
|
Hi Ralph,
Try using "-Munrol=n:4" or "-Munroll=m:4" instead. The "c" option controls the maximum loop count to completely unroll a loop. "n" controls the unroll factor for single block loops while "m" controls the factor for multi-block loops.
Hope this helps,
Mat |
|
| Back to top |
|
 |
|