PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Course

Loop carried dependence

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
chris.sl.lim



Joined: 11 Jan 2013
Posts: 15

PostPosted: Wed Mar 06, 2013 4:59 am    Post subject: Loop carried dependence Reply with quote

Hi all,

I have a very simple problem, I'm trying to accelerate the following loop using OpenAcc.

Code:
      IB(1)  = 1
      IBB(1) = 1
      DO 5 I =2,IMM1
      IB(I) = 1 + (I-1)/NSBLOCK
      NLEFT = IMM1 - (IB(I)-1)*NSBLOCK
      IF(NLEFT.LT.NSMID)  IB(I) = IB(I-1)
      IBB(I) = 1 + (I-1)/NBBLOCK
      NLEFT  = IMM1 - (IBB(I)-1)*NBBLOCK
      IF(NLEFT.LT.NBMID)  IBB(I) = IBB(I-1)
    5 CONTINUE


When I try to compile using OpenAcc, I get the following messages:

Loop carried dependence of 'ib' prevents parallelization
Loop carried backward dependence of 'ib' prevents vectorization
Loop carried dependence of 'ibb' prevents parallelization
Loop carried scalar dependence for 'nleft' at line 4622
Loop carried backward dependence of 'ibb' prevents vectorization

I've tried splitting the loop into as per the directions in this article (http://www.pgroup.com/lit/articles/insider/v1n2a1.htm), but I'm still having trouble with getting this loop offloaded to the accelerator.

Any pointers as to a work-around would be much appreciated. Apologies if I've missed something obvious, but this is all rather new to me!

Thanks,

Chris
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6632
Location: The Portland Group Inc.

PostPosted: Wed Mar 06, 2013 11:29 am    Post subject: Reply with quote

Hi Chris,

Unfortunately, you have two backwards dependencies which will prevent parallelization.

Code:
...
IF(NLEFT.LT.NSMID)  IB(I) = IB(I-1)
...
IF(NLEFT.LT.NBMID)  IBB(I) = IBB(I-1)
...


Unless the value of "IB(i-1)" is computed first, it's value can't be set to "IB(i)". So unless you can remove these statements there's no way to parallelize this loop.

Assuming this is part of a larger section of accelerated code, the question becomes is it faster to run the loop on the host and then copy the IB and IBB arrays to the device later, or run the loop sequentially on the device.

- Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group