PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

CUDA-x86.

Generating SSE code for blocks.

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling
View previous topic :: View next topic  
Author Message
alienanthill



Joined: 11 Jul 2011
Posts: 1

PostPosted: Mon Jul 11, 2011 7:42 am    Post subject: Generating SSE code for blocks. Reply with quote

Hi,

I am trying to generate good SSE code using the PGI compiler. I am running into issues. The compiler is refusing to generate SSE code for a block of code such as this.



Code:

  for (t4 = 0; t4 <= 14; t4++)
      {
        #pragma ivdep
//   #pragma vector aligned
        z[0] = z[0] + A[t4 * 15 + 0] * x[t4];
        z[0 + 1] = z[0 + 1] + A[t4 * 15 + 0 + 1] * x[t4];
        z[0 + 2] = z[0 + 2] + A[t4 * 15 + 0 + 2] * x[t4];
        z[0 + 3] = z[0 + 3] + A[t4 * 15 + 0 + 3] * x[t4];
        z[0 + 4] = z[0 + 4] + A[t4 * 15 + 0 + 4] * x[t4];
        z[0 + 5] = z[0 + 5] + A[t4 * 15 + 0 + 5] * x[t4];
        z[0 + 6] = z[0 + 6] + A[t4 * 15 + 0 + 6] * x[t4];
        z[0 + 7] = z[0 + 7] + A[t4 * 15 + 0 + 7] * x[t4];
        z[0 + 8] = z[0 + 8] + A[t4 * 15 + 0 + 8] * x[t4];
        z[0 + 9] = z[0 + 9] + A[t4 * 15 + 0 + 9] * x[t4];
        z[0 + 10] = z[0 + 10] + A[t4 * 15 + 0 + 10] * x[t4];
        z[0 + 11] = z[0 + 11] + A[t4 * 15 + 0 + 11] * x[t4];
        z[0 + 12] = z[0 + 12] + A[t4 * 15 + 0 + 12] * x[t4];
        z[0 + 13] = z[0 + 13] + A[t4 * 15 + 0 + 13] * x[t4];
        z[0 + 14] = z[0 + 14] + A[t4 * 15 + 0 + 14] * x[t4];
      }


I can re roll the entire block to form a loop. When I do this, the compiler unrolls the loop and vectorizes it but uses only two SSE registers which restricts the instruction level parallelism, Is there a way to get around this ? The block contains a lot of independent instructions perfect for SSE.

Thanks,
Shreyas
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6122
Location: The Portland Group Inc.

PostPosted: Mon Jul 11, 2011 10:42 am    Post subject: Reply with quote

Hi Shreyas,

The above code wont vectorize due to the data dependency on Z. I'm assuming your re-rolled version contains the same dependency? An example would be helpful.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Programming and Compiling All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group