PGI User Forum
 SearchSearch   MemberlistMemberlist     RegisterRegister   ProfileProfile    Log inLog in 

Free OpenACC Webinar

illegal opcode error

 
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming
View previous topic :: View next topic  
Author Message
chris.sl.lim



Joined: 11 Jan 2013
Posts: 15

PostPosted: Thu Apr 18, 2013 9:51 am    Post subject: illegal opcode error Reply with quote

Hi Mat,

I have a big outer loop (containing many inner loops) that I wish to parallelise. All the data in each iteration of this outer loop is independent from one another. For now, I'm happy for all the inner loops to run in serial (what is the best way to implement this?)

I have used the "independent" in order to try and get this working and privatised a number of variables that were giving me trouble.

When I compile, I now get this error:
Code:
PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): illegal opcode (tblock-07.5.f90: 11914)
set_flux_gpu:


After this message, the code appears to generate a kernel, but it all runs on the CPU. Any pointers as to where I'm going wrong would be good.


Code:
 11940, Loop is parallelizable
         Accelerator kernel generated
      11940, !$acc loop gang ! blockidx%x
      11964, !$acc loop vector(128) ! threadidx%x
      12042, !$acc loop vector(128) ! threadidx%x
      ...
      14734, !$acc loop vector(128) ! threadidx%x
      14746, !$acc loop vector(128) ! threadidx%x


After this I get a load of errors telling me that various dependencies are prevent parallelization regarding the inner loops which I'm ignoring for now.

Chris
Back to top
View user's profile
chris.sl.lim



Joined: 11 Jan 2013
Posts: 15

PostPosted: Thu Apr 18, 2013 10:08 am    Post subject: Reply with quote

Sorry to tag more problems onto the same post. I'm getting the following errors

Code:
  11939, Accelerator restriction: scalar variable live-out from loop: cfwall
         Accelerator restriction: scalar variable live-out from loop: vislam


I privatised these values in order to circumvent the problem to get the error mentioned in the previous post, but I don't really have a good reason for doing so.

Both of these scalars are read in at the start of the program (before the GPU loop) and are then only used within the loop (albeit in an inlined function call). They are not used after the GPU loop.

Is this an issue with the inlining, and if so, is there a work round?

Thanks,

Chris
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Thu Apr 18, 2013 10:18 am    Post subject: Reply with quote

Hi Chris,
Quote:
illegal opcode
Unfortunately this is a generic internal compiler error meaning that it's detected that it's generated bad code. I've seen this in a few codes, but the reasons have been different for both. I'd need you to send in your updated code which reproduces the problem, so I can pass it on to engineering for investigation.

Quote:
Is this an issue with the inlining, and if so, is there a work round?
Scalar variables passed to routines (even inlined routines) have the possibility of causing side-effects which can't be detected upon compilation. Hence, the "live-out" error. I typically recommend not privatizing scalars for performance reasons, but this is one case where you need to.

- Mat
Back to top
View user's profile
chris.sl.lim



Joined: 11 Jan 2013
Posts: 15

PostPosted: Thu Apr 18, 2013 10:27 am    Post subject: Reply with quote

Hi Mat,

I've just fired off an email to TRS, hopefully it will yield something.

Is there a good way of parallelising the outer loop without worrying about the internal loops if all the iterations of the outerloop are independent?

Chris
Back to top
View user's profile
mkcolg



Joined: 30 Jun 2004
Posts: 6208
Location: The Portland Group Inc.

PostPosted: Thu Apr 18, 2013 4:32 pm    Post subject: Reply with quote

Hi Chris,

I tried your code against our development compiler and the illegal opcode error goes away. I added TPR#19296 to track your failure and request if the fix for your code can get into the 13.5 release.

Also, I was able to track down where the illegal opcode is coming from in 13.4. It appears to be a problem generating the auto-reduction code for the "DAVGALL" and "DAVG_UNST" sum reduction variables. I'm able to work around the error by adding a explicit reduction clause on the kernel loop directive. (See below)

Quote:
Is there a good way of parallelising the outer loop without worrying about the internal loops if all the iterations of the outerloop are independent?


Add "gang, vector" to your "kernel loop". The compiler will still spit out all the dependency analysis Minfo messages for the inner loops, but they will be become extraneous.

Code:

!$acc kernels loop gang vector independent reduction(+:DAVGALL,DAVG_UNST), &


Do you have data files and expected output that I can use to run and verify the code?
Code:
% tblock-07.5_dev
PGFIO-F-217/formatted read/unit=5/attempt to read past end of file.
 File name = turbine.dat    formatted, sequential access   record = 1


Would this code be available for other purposes once everything is working? Given that this is a ~3000 line kernel, it makes for a nice test for our internal QA. Plus, I'll looking for codes I can use in an OpenACC benchmarking effort I'm doing with several other companies through SPEC (www.spec.org). I'm not sure if the code would make a good benchmark, but I wanted to ask before investigating.

Thanks,
Mat
Back to top
View user's profile
Display posts from previous:   
Post new topic   Reply to topic    PGI User Forum Forum Index -> Accelerator Programming All times are GMT - 7 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © phpBB Group