|
| View previous topic :: View next topic |
| Author |
Message |
chris.sl.lim
Joined: 11 Jan 2013 Posts: 15
|
Posted: Thu Apr 18, 2013 9:51 am Post subject: illegal opcode error |
|
|
Hi Mat,
I have a big outer loop (containing many inner loops) that I wish to parallelise. All the data in each iteration of this outer loop is independent from one another. For now, I'm happy for all the inner loops to run in serial (what is the best way to implement this?)
I have used the "independent" in order to try and get this working and privatised a number of variables that were giving me trouble.
When I compile, I now get this error:
| Code: | PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): illegal opcode (tblock-07.5.f90: 11914)
set_flux_gpu: |
After this message, the code appears to generate a kernel, but it all runs on the CPU. Any pointers as to where I'm going wrong would be good.
| Code: | 11940, Loop is parallelizable
Accelerator kernel generated
11940, !$acc loop gang ! blockidx%x
11964, !$acc loop vector(128) ! threadidx%x
12042, !$acc loop vector(128) ! threadidx%x
...
14734, !$acc loop vector(128) ! threadidx%x
14746, !$acc loop vector(128) ! threadidx%x |
After this I get a load of errors telling me that various dependencies are prevent parallelization regarding the inner loops which I'm ignoring for now.
Chris |
|
| Back to top |
|
 |
chris.sl.lim
Joined: 11 Jan 2013 Posts: 15
|
Posted: Thu Apr 18, 2013 10:08 am Post subject: |
|
|
Sorry to tag more problems onto the same post. I'm getting the following errors
| Code: | 11939, Accelerator restriction: scalar variable live-out from loop: cfwall
Accelerator restriction: scalar variable live-out from loop: vislam |
I privatised these values in order to circumvent the problem to get the error mentioned in the previous post, but I don't really have a good reason for doing so.
Both of these scalars are read in at the start of the program (before the GPU loop) and are then only used within the loop (albeit in an inlined function call). They are not used after the GPU loop.
Is this an issue with the inlining, and if so, is there a work round?
Thanks,
Chris |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Apr 18, 2013 10:18 am Post subject: |
|
|
Hi Chris,
Unfortunately this is a generic internal compiler error meaning that it's detected that it's generated bad code. I've seen this in a few codes, but the reasons have been different for both. I'd need you to send in your updated code which reproduces the problem, so I can pass it on to engineering for investigation.
| Quote: | | Is this an issue with the inlining, and if so, is there a work round? | Scalar variables passed to routines (even inlined routines) have the possibility of causing side-effects which can't be detected upon compilation. Hence, the "live-out" error. I typically recommend not privatizing scalars for performance reasons, but this is one case where you need to.
- Mat |
|
| Back to top |
|
 |
chris.sl.lim
Joined: 11 Jan 2013 Posts: 15
|
Posted: Thu Apr 18, 2013 10:27 am Post subject: |
|
|
Hi Mat,
I've just fired off an email to TRS, hopefully it will yield something.
Is there a good way of parallelising the outer loop without worrying about the internal loops if all the iterations of the outerloop are independent?
Chris |
|
| Back to top |
|
 |
mkcolg
Joined: 30 Jun 2004 Posts: 4996 Location: The Portland Group Inc.
|
Posted: Thu Apr 18, 2013 4:32 pm Post subject: |
|
|
Hi Chris,
I tried your code against our development compiler and the illegal opcode error goes away. I added TPR#19296 to track your failure and request if the fix for your code can get into the 13.5 release.
Also, I was able to track down where the illegal opcode is coming from in 13.4. It appears to be a problem generating the auto-reduction code for the "DAVGALL" and "DAVG_UNST" sum reduction variables. I'm able to work around the error by adding a explicit reduction clause on the kernel loop directive. (See below)
| Quote: | | Is there a good way of parallelising the outer loop without worrying about the internal loops if all the iterations of the outerloop are independent? |
Add "gang, vector" to your "kernel loop". The compiler will still spit out all the dependency analysis Minfo messages for the inner loops, but they will be become extraneous.
| Code: |
!$acc kernels loop gang vector independent reduction(+:DAVGALL,DAVG_UNST), & |
Do you have data files and expected output that I can use to run and verify the code?
| Code: | % tblock-07.5_dev
PGFIO-F-217/formatted read/unit=5/attempt to read past end of file.
File name = turbine.dat formatted, sequential access record = 1 |
Would this code be available for other purposes once everything is working? Given that this is a ~3000 line kernel, it makes for a nice test for our internal QA. Plus, I'll looking for codes I can use in an OpenACC benchmarking effort I'm doing with several other companies through SPEC (www.spec.org). I'm not sure if the code would make a good benchmark, but I wanted to ask before investigating.
Thanks,
Mat |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2002 phpBB Group
|