the code fails to compile with -Minline

OpenACC and CUDA Fortran
Post Reply
@and
Posts: 43
Joined: Sep 22 2017

the code fails to compile with -Minline

Post by @and » Wed Mar 18, 2020 1:08 pm

Hallo. I am working on some code at https://github.com/AndStorm/QUESTION.git (last commit, the build directory is where nbody.cpp lies). The code works properly on CPU using gcc and PGI 19.10. The code compiles for launching on GPU using the compile line (GPU GeForce 650 Ti installed in Intel Core i7 CPU, compute capability 3.0):

Code: Select all

cmake . -DCMAKE_C_COMPILER=pgcc -DCMAKE_CXX_COMPILER=pgc++
-DCMAKE_C_FLAGS="-acc -Minfo=acc -mcmodel=medium -ta=tesla:cc30 -Mcuda=cuda10.1"
-DCMAKE_CXX_FLAGS="-acc -Minfo=acc -mcmodel=medium -ta=tesla:cc30 -Mcuda=cuda10.1"
-DCMAKE_CXX_STANDARD=17 -DACC=ON -DCUDA=ON
But the problem is that the code does not work properly on GPU using PGI 19.10 + OpenAcc without inlining using -Minline compile option. But when I add the -Minline option to the compile line, the compilation fails with (ERROR.dat):
/opt/pgi/linux86-64-llvm/19.10/share/llvm/bin/opt: /tmp/pgc++2-XcIvZiTGvW.ll:1646:103: error: use of undefined value '@__sti___70__home_70_gaa_NFbuild_script_CHECK_GPU_CURRENT_WORK_TEMP_COPY_nbody_cpp_bc1207be'
@llvm.global_ctors = appending global [4 x { i32, void ()* }][{ i32, void ()* } { i32 65535, void ()* @__sti___70__home_70_gaa_NFbuild_script_CHECK_GPU_CURRENT_WORK_TEMP_COPY_nbody_cpp_bc1207be }, { i32, void ()* } { i32 65535, void ()* @..acc_data_constructor_1 }, { i32, void ()* } { i32 65535, void ()* @..acc_cuda_funcreg_constructor_1 }, { i32, void ()* } { i32 65535, void ()* @Mcuda_compiled }]
^
CMakeFiles/Test.dir/build.make:62: recipe for target 'CMakeFiles/Test.dir/nbody.cpp.o' failed
make[2]: *** [CMakeFiles/Test.dir/nbody.cpp.o] Error 2
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/Test.dir/all' failed
make[1]: *** [CMakeFiles/Test.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
I spent the whole day seeking how to fix the error, but did not find any way out.
Please, help me get this code to work.

mkcolg
Posts: 8319
Joined: Jun 30 2004

Re: the code fails to compile with -Minline

Post by mkcolg » Thu Mar 19, 2020 10:54 am

Hi Andrey,

I was able to recreate the issue with PGI 19.10 and 20.1, and it does appear to be a compiler code generation issue. Though, it looks like we have a fix in place already which will be available in the next release.

We were planning on releasing next week in conjunction with GTC, but due to COVID-19 disruptions, we're having to postpone the release a bit. May be a few more weeks.

-Mat

@and
Posts: 43
Joined: Sep 22 2017

Re: the code fails to compile with -Minline

Post by @and » Thu Mar 19, 2020 2:00 pm

Hi, Mat.
Thank You for the answer. Of course, understand COVID-19 disruptions, which postpone the PGI release.
But, You see, this code is a part of my current work, on which I am to report in the nearest weeks. I am afraid, I will not have time to wait for the next PGI release (as You wrote, a few more weeks). So, I am in trouble.

Maybe there is some workaround how to get this code to work (maybe change somehow the architecture of the code or data transfers to GPU from CPU, so that it would compile for GPU using -Minline)?

Maybe if to delete #pragma acc declare create(...) on line 12 in body/include/T3AllocateData.h, add #pragma acc data copyin(particles) on line 45 in nbody.cpp and pass the array particles as a function parameter (add one more function parameter Particle<double> * particles) in GetFS(...) in body/include/T3Process.h and in tpt/include/T3InelasticddImpl.h, the code will compile with -Minline?

If You see how to get this code to work on GPU using PGI 19.10 + OpenAcc, maybe somehow changing its architecture or allocation of data on GPU, please, tell me. It is very important for me.

Thank You very much.
Andrey.

mkcolg
Posts: 8319
Joined: Jun 30 2004

Re: the code fails to compile with -Minline

Post by mkcolg » Thu Mar 19, 2020 3:48 pm

Maybe if to delete #pragma acc declare create(...) on line 12 in body/include/T3AllocateData.h, add #pragma acc data copyin(particles) on line 45 in nbody.cpp and pass the array particles as a function parameter (add one more function parameter Particle<double> * particles) in GetFS(...) in body/include/T3Process.h and in tpt/include/T3InelasticddImpl.h, the code will compile with -Minline?
Possibly? Although I'm not compiler engineer nor can really understand the intermediate LLVM code, I do believe the issue has to do with "particle" in some way since "llvm.global_ctors" is the global constructor used to creating global classes and structs. Though I don't know what the missing symbol refers to. Possibly the compiler is inadvertently removing this symbol since after inlining it doesn't see that it's used any longer. In the 20.3 pre-release, this symbol is not removed.

Of course, this is just a guess as to what's going on. I tried adding "__attribute__((noinline))" to the Particle and T3LorentzVector constructors so they don't get inlined, but it didn't change anything.

Note, the problem I see with you removing "particles" from the "declare" directive is that you access "particles" directly from subroutines. If any of these routines are offloaded to the GPU, then you "declare" is the only method to create a globally accessible variable. If this is the case, then you'd need to also pass "particles" as an argument to these routines so the global reference isn't necessary.

-Mat

Post Reply