|
| View previous topic :: View next topic |
| Author |
Message |
TheMatt
Joined: 06 Jul 2009 Posts: 263 Location: Greenbelt, MD
|
Posted: Tue Sep 01, 2009 8:19 am Post subject: Loop carried reuse prevents parallelization |
|
|
As I'm trying to learn to rewire my brain for parallel thinking, I've been trying various things to reduce the number of "loop carried dependence", "loop carried reuse" and other issues reported by -Minfo=accel. One particular loop has been stymieing me, so I'm coming here to try and figure it out.
To wit, the loop: | Code: | 217 do i=1,m
218 do k=0,np
219 fsdir(i)=tda(i,k,2)
220 enddo
221 enddo | where those are line numbers, not statement labels.
By the time the code gets to here, tda has been constructed, and fsdir has not appeared anywhere else (and never does again). Also, tda(:,:,:) is local to the whole !$acc region and fsdir(:) is copyout.
When the compiler gets here it says:
| Code: | 217, Loop is parallelizable
218, Loop carried reuse of fsdir prevents parallelization
Inner sequential loop scheduled on accelerator
Accelerator kernel generated
217, !$acc do parallel, vector(256)
Using register for 'fsdir'
218, !$acc do seq | I guess I'm confused as to why this is not parallel, vector(16)-parallel, vector(16) as I'm used to seeing in cases like this. Is it because fsdir(:) is a copyout array and as such has internal restrictions regarding memory layout or the like? (And, of course, it maybe that is faster than the 16x16 method, I'm just wondering about that 'loop carried reuse' issue.) |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Sep 03, 2009 9:45 am Post subject: |
|
|
Hi Matt,
The outer "i" loop is being parallelized. However the inner loop is not because for each iteration of the k loop, the same element of fsdir is being assigned to (i.e. loop carried re-use). So if the k loop were to be parallelized, all "k" threads would be trying to assign their values to the same spot, leading to nod-deterministic results. To parallelize the k loop, you'll need to make fsidr a two dimensional array.
Note that we are working on adding support for reductions within accelerator regions. My guess is that your code is more like "fsdir(i) = fsdir(i) + tda(i,k,2)", in which case we should be able to parallelize the inner loop once this support has been added.
- Mat |
|
| Back to top |
|
 |
TheMatt
Joined: 06 Jul 2009 Posts: 263 Location: Greenbelt, MD
|
Posted: Thu Sep 03, 2009 12:06 pm Post subject: |
|
|
You know, you are right and that is actually what the code is doing (in some ways). Guess I've found a place to redo a bit of coding!
Thanks,
Matt |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|