Atomics and TLS

Questions on using the PGI Compilers and Tools
Post Reply
nemequ
Posts: 15
Joined: Feb 12 2017

Atomics and TLS

Post by nemequ » Mon Feb 13, 2017 4:12 pm

I'm trying to port some projects over to PGI, but I keep running into problems with atomics and thread-local storage. First off, TLS…

Is there any way to determine whether -c11 was passed to the compiler or, better yet, whether TLS is supported? It would also be acceptable to use a PGI-specific construct, if one exists (preferable, even, if it doesn't require a special flag). Basically, I'm trying to port something like this to PGI:

Code: Select all

#if defined(_Thread_local) || (defined(__STDC_VERSION__) && (__STDC_VERSION__ >= 201102L))
#  define THREAD_LOCAL _Thread_local
#elif defined(__GNUC__) || defined(__INTEL_COMPILER) || defined(__SUNPRO_CC) || defined(__IBMCPP__)
#  define THREAD_LOCAL __thread
#elif defined(_WIN32)
#  define THREAD_LOCAL __declspec(thread)
#else
#  error No TLS implementation found.
#endif
_Thread_local seems to work with PGI, but ONLY if -c11 is passed.

__STDC_VERSION__ is defined as 199901L even if -c11 is passed, so my current code emits an error. I can understand that; PGI's C11 support is still incomplete, so advertising it in __STDC_VERSION__ would be premature. Unfortunately, though, it puts me in a tough spot… AFAICT there is no way to tell in the preprocessor whether the compiler supports _Thread_local.

Normally I would use __PGIC__, __PGIC_MINOR__, and __PGIC_PATCHLEVEL__ to check for support (and ignore __STDC_VERSION__), but since TLS only works in C11 mode (unlike other compilers) that doesn't do much good. I've also tried using C11 macros like __STDC_NO_THREADS__ and __STDC_NO_ATOMICS__ in hopes of detecting whether PGI is in C11 mode, but unlike _Thread_local they're defined in PGI's C99 mode, too.

As for atomics, is there some variant of atomics which PGI supports that I'm missing? I already have support for

* Old GCC-style (__sync_*)
* New GCC-style (__atomic_*)
* clang-style (__c11_*)
* C11-style (stdatomic.h)
* MS-style (Interlocked*)

I'm happy to add another method, but I can't seem to figure out how to do atomics in PGI. So far my best guess is to require OpenACC or maybe OpenMP, but that's some pretty significant overhead and I'd strongly prefer something which doesn't require a compiler flag; this is for a reusable header which you can currently just drop into any C project and be done with it.

Also, I have a PRNG which requires CAS (or a lock), but I don't see a way to do an atomic compare and swap with OpenACC. This seems like an odd omission… am I missing something, or should I just fall back on a spinlock for OpenACC?

jtull
Posts: 1103
Joined: Jun 30 2004

Post by jtull » Wed Feb 15, 2017 9:57 am

I sent your comments to engineering, and they responded

1. You need to #define in your code to tell that __thread or __Thread_local is supported for the PGI C compiler. We have a problem here, and
I logged TPR 23783 to add this capability.

2. PGI has a plan to support gcc style atomics for the C and C++ compiler that can be used outside of OpenMP and OpenACC. We do not currently have that support for pgcc. Use gcc when you need it outside those areas.

3. OpenACC does not provide CAS as a first-class directive. It does provide atomic read/write/capture directives, however, there is no mechanism to do an atomic compare. Implementing a critical section across gangs/workers/vectors is not guaranteed to work since the OpenACC execution model allows a thread (owning the lock) to be suspended until other threads complete.

They can only suggest that you do the CAS sequentially.

nemequ
Posts: 15
Joined: Feb 12 2017

Post by nemequ » Thu Feb 16, 2017 6:45 pm

jtull wrote:1. You need to #define in your code to tell that __thread or __Thread_local is supported for the PGI C compiler. We have a problem here, and I logged TPR 23783 to add this capability.
FWIW, I'd prefer it if you just enabled TLS across the board (i.e., in C99 and even C89 mode) as an extension, which is what other compilers (gcc, clang, icc) do.

Since you already have _Thread_local, it seems like supporting __thread wouldn't be much effort, and probably worthwhile to make code easier to port. At least GCC, clang, ICC, SunCC, and IBM XL C support it… there is also the MSVC-specific __declspec(thread).

Anyways, thanks for the info :)

jtull
Posts: 1103
Joined: Jun 30 2004

Post by jtull » Tue Feb 20, 2018 11:11 am

TPR 23783 - Set __thread and __Thread_local when -c11/-c1x NOT set.

is fixed in the current 18.1 release.

The issue got corrected when we changed the value for the predefined Macro __STDC_VERSION__ from the wrong the value 199901L to the correct value 201112L when using the switch '-c11'.

dave

Post Reply