| View previous topic :: View next topic |
| Author |
Message |
alienanthill
Joined: 11 Jul 2011 Posts: 1
|
Posted: Mon Jul 11, 2011 7:42 am Post subject: Generating SSE code for blocks. |
|
|
Hi,
I am trying to generate good SSE code using the PGI compiler. I am running into issues. The compiler is refusing to generate SSE code for a block of code such as this.
| Code: |
for (t4 = 0; t4 <= 14; t4++)
{
#pragma ivdep
// #pragma vector aligned
z[0] = z[0] + A[t4 * 15 + 0] * x[t4];
z[0 + 1] = z[0 + 1] + A[t4 * 15 + 0 + 1] * x[t4];
z[0 + 2] = z[0 + 2] + A[t4 * 15 + 0 + 2] * x[t4];
z[0 + 3] = z[0 + 3] + A[t4 * 15 + 0 + 3] * x[t4];
z[0 + 4] = z[0 + 4] + A[t4 * 15 + 0 + 4] * x[t4];
z[0 + 5] = z[0 + 5] + A[t4 * 15 + 0 + 5] * x[t4];
z[0 + 6] = z[0 + 6] + A[t4 * 15 + 0 + 6] * x[t4];
z[0 + 7] = z[0 + 7] + A[t4 * 15 + 0 + 7] * x[t4];
z[0 + 8] = z[0 + 8] + A[t4 * 15 + 0 + 8] * x[t4];
z[0 + 9] = z[0 + 9] + A[t4 * 15 + 0 + 9] * x[t4];
z[0 + 10] = z[0 + 10] + A[t4 * 15 + 0 + 10] * x[t4];
z[0 + 11] = z[0 + 11] + A[t4 * 15 + 0 + 11] * x[t4];
z[0 + 12] = z[0 + 12] + A[t4 * 15 + 0 + 12] * x[t4];
z[0 + 13] = z[0 + 13] + A[t4 * 15 + 0 + 13] * x[t4];
z[0 + 14] = z[0 + 14] + A[t4 * 15 + 0 + 14] * x[t4];
} |
I can re roll the entire block to form a loop. When I do this, the compiler unrolls the loop and vectorizes it but uses only two SSE registers which restricts the instruction level parallelism, Is there a way to get around this ? The block contains a lot of independent instructions perfect for SSE.
Thanks,
Shreyas |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Mon Jul 11, 2011 10:42 am Post subject: |
|
|
Hi Shreyas,
The above code wont vectorize due to the data dependency on Z. I'm assuming your re-rolled version contains the same dependency? An example would be helpful.
- Mat |
|
| Back to top |
|
 |
|