PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

Loop unrolling
Goto page 1, 2  Next
 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
AndyP



Joined: 14 Mar 2011
Posts: 10
Location: Daresbury Lab, UK

PostPosted: Wed Mar 30, 2011 6:08 am    Post subject: Loop unrolling Reply with quote

Hello,

I have a loop with a trip-count of just 2 that is itself at the heart of a triply-nested loop. I'm thinking that the best thing to do is to unroll this loop but it seems that the compiler (v.11.3) rejects my !$ACC DO SEQ, UNROLL(2). If I do !$ACC DO UNROLL(2) then it doesn't complain but its output tells me that it has used do VECTOR(2). Is that because that is a better option and it has silently ignored my advice? I was thinking there may be an overhead with loop set-up that I could avoid by unrolling...

Thanks for any insight,

Andy.
Back to top
View user's profile
AndyP



Joined: 14 Mar 2011
Posts: 10
Location: Daresbury Lab, UK

PostPosted: Wed Mar 30, 2011 6:53 am    Post subject: Re: Loop unrolling Reply with quote

Just to update my own post, I've gone through and manually unrolled those small loops and see that I get ~factor of two speed-up on quite a few of them.
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6697
Location: The Portland Group Inc.

PostPosted: Wed Mar 30, 2011 8:22 am    Post subject: Reply with quote

Hi Andy,

The semantics of the unroll clause changed a bit. Now the unroll clause is applied to the preceding loop schedule clause. Hence, " !$ACC DO UNROLL(2)" has no meaning. "!$ACC DO SEQ, UNROLL(2)" means apply the unroll to the sequential loop, which I don't think would apply since you don't have a sequential loop.

Instead, try adding "!$ACC DO VECTOR(16) UNROLL(2)" before line 246 (i.e. the inner loop) and see if that get the desired unrolling.

- Mat
Back to top
View user's profile
AndyP



Joined: 14 Mar 2011
Posts: 10
Location: Daresbury Lab, UK

PostPosted: Thu Mar 31, 2011 2:36 am    Post subject: Reply with quote

Hi Mat,
Quote:

Instead, try adding "!$ACC DO VECTOR(16) UNROLL(2)" before line 246 (i.e. the inner loop) and see if that get the desired unrolling.

I don't understand this idea - could you explain it? I've tried it by leaving directives off the outer three loops and putting only the directive you've suggested on the inner loop (what was 246 but is now 268). The compiler responds with:
Code:

    268, Loop is parallelizable
         Accelerator kernel generated
        259, !$acc do parallel ! blockidx%y
        263, !$acc do parallel, vector(2) ! blockidx%x threadidx%z
        265, !$acc do vector(8) ! threadidx%x
             Cached references to size [8x2] block of 'e3u'
             Cached references to size [8x2] block of 'e2u'
             Cached references to size [8x2] block of 'ahtu'
             Cached references to size [8x2] block of 'e1u'
             Cached references to size [8x2] block of 'uslp'
             Cached references to size [8x2] block of 'umask'
             Cached references to size [9x3x2] block of 'tmask'
             Cached references to size [8x2] block of 'e3v'
             Cached references to size [8x2] block of 'e1v'
             Cached references to size [8x2] block of 'ahtv'
             Cached references to size [8x2] block of 'e2v'
             Cached references to size [8x2] block of 'vslp'
             Cached references to size [8x2] block of 'vmask'
        268, !$acc do vector(32) unroll(2) ! threadidx%y
             Cached references to size [9x3x32] block of 'zdk1t'
             Cached references to size [9x3x32] block of 'zdkt'
             CC 1.3 : 64 registers; 15816 shared, 1192 constant, 328 local memory bytes; 25% occupancy
             CC 2.0 : 63 registers; 15800 shared, 1200 constant, 0 local memory bytes; 33% occupancy

and the resulting code is very slow.
I'll email you the code (same one as in my other post).

Thanks for your help,

Andy.
Back to top
View user's profile
xray



Joined: 21 Jan 2010
Posts: 85

PostPosted: Mon Jan 30, 2012 9:10 am    Post subject: Reply with quote

Hi,
any results on this problem? I think I have a similar issue. I have an outer loop which is distributed to the threads and threadblocks and an inner loop which shall be executed serially by each thread:

Code:
#pragma acc region
#pragma acc for parallel vector(32)
for (i<n) {
  // do something
  for (j <m) { /* do something*/ }
}


Now, I try to unroll the inner loop, however my unroll statements are always ignored. I tried:
1) #pragma acc seq unroll(4)
2) #pragma unroll(4)

What is the problem? If I manually unroll the inner loop, I get a nice speedup...
Bye, Sandra
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group