musl - Re: C11 threads

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140726073501.GI4038@brightrain.aerifal.cx>
Date: Sat, 26 Jul 2014 03:35:01 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: C11 threads

On Sat, Jul 26, 2014 at 09:16:40AM +0200, Jens Gustedt wrote:
> > It's about what happens when a thread exits whole holding a recursive
> > or errorchecking mutex. If the ownership of that mutex is tracked by a
> > thread id stored in the mutex (this is the only practical way to do
> > it), a newly created thread could wrongly become the owner of the
> > orphaned mutex just by getting the same thread id (by chance). The
> > only implementation options to avoid this are to have thread ids so
> > large that values never have to be reused, or to track the list of
> > mutexes owned by a thread so that it can change the owner to a dummy
> > value that will never match when it exits.
> > 
> > The obvious way to avoid this problem would be to add to the
> > specification:
> > 
> > "If a thread exits while it is the owner of a mutex, the behavior is
> > undefined."
> 
> noted, I'll watch that something in that sense is included.
> 
> Probably this would need a bit more precision about "exits", I would
> prefer to use "terminates". A thread terminates not necessarily
> immediately after it returns or calls "thrd_exit", the tss destructors
> should be called still from the same thread.
> 
> We should allow a thread to cleanup its mess with tss destructors.

Yes, I agree completely that "terminates" is the right wording.

> > > And to my limited experience having well defined atomics that are
> > > integrated in the language, often helps to completely avoid mutexes
> > > and conditions.
> > 
> > I'm not sure about that. Atomics are mostly useful for the situations
> > where spinlocks would suffice. They don't help anywhere you would
> > expect a "wait" operation to happen (e.q. waiting for a queue to
> > become non-empty or non-full).
> 
> Probably we have complementary experiences. Many uses of mutexes and
> conditions in application code are about sharing and updating shared
> resources. Often the resource protected is just a counter or other
> small data. Especially counters are much better served with atomics.
> 
> And you are right mentioning that, in many situations spin locks do
> effectively suffice. C11 has atomic_flag for that. Often locks are
> just taken for critical sections of code that do a small task, just
> some instructions.

I wasn't trying to say that spinlocks suffice in place of atomics, but
rather that atomics can rarely replace synchronization primitives, and
that in situations where you can't use a spinlock (because you expect
it to be waiting a long time for the lock, or because you want to do a
lot with the lock held) atomics are unlikely to solve the problem.

> > > I only need EBUSY, EINVAL, ENOMEM, and ETIMEDOUT, and effectively only
> > > that these are consistent with the rest of the C library, which for
> > > this implementation of C threads will always be musl.
> > 
> > The point of ABI compatibility is that (at this point just some)
> > binaries and (more importantly) shared libraries without source that
> > were built/linked against glibc can be used with musl. But for this to
> > work, the values of the constants need to be the same.
> 
> yes, I am aware of that. That is why it is important to have the
> thrd-constants and the E-constants in line.
> 
> The approach with weak aliases only works if the return codes of the
> functions agree. We need
> 
> enum {
>   thrd_success = 0,
>   thrd_busy = EBUSY,
>   thrd_error = EINVAL,
>   thrd_nomem = ENOMEM,
>   thrd_timedout = ETIMEDOUT,
> };
> 
> and I don't think that there is much of a sensible way to do that
> differently. Already the naming of the constants suggest that these
> are the values that people (who?) had in mind when designing these
> interfaces.

Actually, I disagree with your mappings. thrd_error is not going to
come from EINVAL (IIRC the only place POSIX specifies EINVAL anyway is
as an optional error when you already invoked UB) but rather from
attempting to lock a recursive mutex when the count is already maxed
out. And thrd_nomem is not going to come from ENOMEM, since
pthread_create returns EAGAIN, not ENOMEM, when memory for the thread
cannot be allocated.

With this in mind, your idea of using errno codes is seeming less and
less reasonable to me.

For reference, here's a summary of the only possible errors I see:

thrd_timedout - for timed operations

thrd_busy - for mtx_trylock

thrd_error - for mtx_lock or trylock (maxed recursive lock count), or
tss_create (no more tss slots)

thrd_nomem - for thrd_create

All of the other cases looked like things that cannot happen with musl
(or with any reasonable implementation).

> > Obviously if the error values are used directly, duplicating them in
> > another header is more trouble since they vary per-arch. This is part
> > of why I would actually prefer not to use them for the thread function
> > result codes, but which we do will depend on which way glibc does it.
> > I can check in with them and see if they have a plan yet.
> 
> yes, that would be good
> 
> I, on my site, will try to have something added to the C specification
> that threads.h also includes errno.h. For the moment it is only
> specified that it does so with time.h.
> 
> Perhaps it would even be good to have the thrd-constants also be
> exported by errno.h. These are error codes, finally.

I disagree here. There are many different kinds of result codes in C
and POSIX which are in different domains. If the thrd_* result codes
had been intended to be in the same domain as errno values, these
functions would have just been specified to return errno values. It's
possible that they might match errno values, but that's certainly an
implementation detail, not something that applications could
reasonably depend on, and therefore I think it makes no sense to
require or allow threads.h to expose errno.h.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.