musl - Re: C threads, v3.0

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 4 Aug 2014 13:06:37 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: C threads, v3.0

On Mon, Aug 04, 2014 at 06:48:59PM +0200, Jens Gustedt wrote:
> Am Montag, den 04.08.2014, 10:50 -0400 schrieb Rich Felker:
> > On Mon, Aug 04, 2014 at 11:30:03AM +0200, Jens Gustedt wrote:
> > > I'd like to discuss two issues before going further. The first is of
> > > minor concern. In many places of that code _m_lock uses a flag, namely
> > > 0x40000000. I only found places where this bit is read, but not where
> > > it is set. Did I overlook something or is this a leftover?
> > 
> > It's set by the kernel when a thread dies while owning a robust mutex.
> 
> ah, ok, good to know, this was not obvious at all to me.
> 
> so I will leave this in, although C mutexes are not supposed to be
> robust.

If the C committee decides that the case of exiting with a mutex
locked needs to be handled (i.e. the mutex must be permanently locked,
not falsely owned by a new thread that happens to get the same id)
then we need to track such mutexes in the robust list. Technically we
wouldn't have to rely on the kernel to do anything with them (that
only matters for process-shared case; for process-local, we control
thread exit and can handle them ourselves) but letting the kernel do
it does reduce the amount of code in pthread_exit, so it might be
desirable still.

> > > The second issue concerns tsd...
> 
> > > I don't think that this policy is ideal for C threads, but it could
> > > perhaps be revised for pthreads, too. My idea is the
> > > following. (version for C threads with minimal impact on pthreads)
> > > 
> > >  - don't assign a tsd in thrd_create
> > > 
> > >  - change pthread_getspecific, pthread_setspecific, tss_get, and
> > >    __pthread_tsd_run_dtors such that they check for nullness of
> > >    tsd. this is a trivial and non-expensive change.
> > >    pthread_setspecific may return ENOMEM or EINVAL in such cases. The
> > >    getters should just return 0. __pthread_tsd_run_dtors obviously
> > >    would just do nothing, then.
> > 
> > EINVAL is definitely not permitted here since ENOMEM is required by
> > POSIX for this case, if it happens.
> 
> My idea is that ENOMEM doesn't fit too well here, since it is not this
> call to pthread_setspecific that is out of memory, but that the key is
> an invalid key for this specific thread. (It would only happen to C
> threads that use pthread_setspecific.)

But the key isn't invalid; it was a valid key obtained by
pthread_key_create or the C11 version. Rather, the failure is a
failure to obtain storage to store the value.

> > >  - change tss_set, to mmap the table if it is absent and if it is
> > >    called with a non-null pointer argument. (I.e if it really has to
> > >    store something). C11 allows this function to fail, so we could
> > >    return thrd_error if mmap fails.
> > > 
> > >  - change thrd_exit to check for tsd and to unmap it at the end if
> > >    necessary
> > > 
> > > For thrd_create one of the consequences would be that main hasn't to
> > > be treated special with that respect. The additional mmap and unmap in
> > > tss_set should only concern entire pages. This should only have a
> > > minimal runtime overhead, but which only would occur for threads that
> > > call tss_set with a non-null pointer to be stored.
> > > 
> > > Please let me know what you think.
> > 
> > This is not an acceptable implementation (at least from the standpoint
> > of musl's design principles); it sacrifices fail-safe behavior
> > (guaranteeing non-failure of an interface that's usually impossible to
> > deal with failure from, and for which there's no fundamental reason it
> > should be able to fail) to safe a tiny amount of memory.
> 
> I see. (Also it is not *only* to save memory, it is also to simplify
> the implementation and to put the burden on the real user of a
> feature.)
> 
> I am not completely convinced of your analysis. pthread_key_create and
> pthread_setspecific have failure paths that are defined by
> POSIX.

POSIX also allows unspecified failures for pthread_mutex_unlock, but
obviously if unlock fails when it should not, there's no way the
program can proceed.

In general programs are very bad about handling resource allocation
failures. The typical behavior, even in library code, is to abort the
whole program, which is obviously bad. So whenever we can guarantee
that a failure will not happen, even if POSIX doesn't mandate that it
can't happen, this is a big win for improving the overall fail-safety
of the system as a whole.

> > For static linking, I think it's completely acceptable to assume that,
> > if tsd functions are linked, they're going to be used.
> 
> As I said, I think this is a feature that is rarely used anyhow.  For
> the code that actually links pthread_key_create, it is not at all
> clear to me that pthread_setspecific is then used by a lot of
> threads. I don't have stats for this, obviously, but the tss stuff
> will even be used less.

It's not that rare, and it's the only way in portable pre-C11 POSIX
programs to use thread-local storage. Presumably most such programs
check for the availability of gcc-style __thread and use it if
possible, only falling back to the old pthread TSD if it's not.

However there's also another BIG use case that's not obsolete:
POSIX/C11 TSD keys are the only way to store thread-local data that
needs a destructor to be called when the thread exits. Even if gcc
provides an extension to do dtors on TLS objects in C like C++11 TLS
(I'm not sure if it does), gcc's implementation of C++11 TLS
ctors/dtors is not fail-safe or AS-safe; it requires gratuitous
allocation of a structure to store the dtor, and aborts the program if
this allocation fails.

> > Unfortunately
> > this isn't possible for dynamic linking, and I had considered an
> > alternate implementation for dynamic linking, based on having the
> > first call to pthread_key_create simulate loading of a shared library
> > containing a TLS array of PTHREAD_KEYS_MAX pointers. This would make
> > the success/failure detectable early at a time where there's
> > inherently a possibility of failure (too many keys) rather than where
> > failure is just a consequence of a bad implementation.
> 
> I am not sure what you want to imply by /bad/ implementation :)
> 
> Just to clear that out, the way that I propose would activate a
> potential failure path if the process ran out of memory, correct. But
> the real problem on failure here would be the lack of memory and not
> the failure of the pthread_setspecific call itself. In the current
> model the corresponding call to pthread_create or thrd_create would
> have failed for ENOMEM.

Right. It's much better for pthread_create (which is fundamentally a
resource allocation action) to fail than for assignment to an
allocated resource (already conceptually allocated by
pthread_key_create) to fail. Although it's not as bad as real TLS
failure like glibc has (since there's no way to report the failure of
the latter; it has to crash), it's still a situation that's difficult
to handle at run-time and where applications and even libraries are
likely to handle it just by aborting the program.

> > But the
> > complexity of doing this, and the added unnecessary failure case
> > (failure before PTHREAD_KEYS_MAX is hit, which really shouldn't
> > happen), seemed unjustified for a 0.5-1k savings.
> 
> I completely agree that such faking of a shared library load would be
> an overkill.
> 
> For C threads, there would also be an option to avoid messing around
> with the pthread key system and keep that intact. We don't know what
> POSIX will decide on how they should interact, anyhow.

The only possible interaction we could care about would be if POSIX
specified that PTHREAD_KEYS_MAX keys must be allocatable even if some
C11 keys have already been allocated, and/or vice versa. In this case
we could need to separately track the number of each and allow a
higher total number. But I don't see how this would otherwise affect
the requirements on, or implementation of, either one.

> We could implement tss separately. That's quite simple to do.

Yes but that of course increases the cost of a largely-useless (except
in the case of needing a dtor) feature, especially in per-thread
memory needed, which I would like to avoid unless it turns out to be
needed.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.