musl - Re: PATCH: don't call cleanup handlers after a regular return from the thread start function

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140806020254.GP1674@brightrain.aerifal.cx>
Date: Tue, 5 Aug 2014 22:02:54 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: PATCH: don't call cleanup handlers after a regular return
 from the thread start function

On Wed, Aug 06, 2014 at 01:19:58AM +0200, Jens Gustedt wrote:
> Am Dienstag, den 05.08.2014, 17:48 -0400 schrieb Rich Felker:
> > On Tue, Aug 05, 2014 at 11:05:41PM +0200, Jens Gustedt wrote:
> > > It is not explicitly forbidden and I think this is sufficient to allow
> > > for it.
> > 
> > Actually I don't think it's possible to implement a call_once that
> > supports longjmp out without a longjmp which does unwinding,
> 
> simple longjmp, perhaps not, longjmp that is followed by thread
> termination, yes. I just have done it, so it is possible :)
> 
> As I mentioned earlier, you can do this by having everything
> statically allocated. I am using a struct for once_flag that has the
> `int` for control, another `int` for waiters and a `struct __ptcb` for
> the call to _pthread_cleanup_push. call_once is just a copy of
> pthread_once, just using these fields of the once_flag, instead of the
> local variables.

This doesn't solve the problem. The problem is not the lifetime of the
cancellation buffer (which doesn't even need to exist for call_once),
but the fact that there's no way to track that the in-progress call
was "finished" by longjmp so that another one can be started.

BTW a related issue is that recursive calls to call_once are
unspecified: what happens if the init function passed to call_once
itself calls call_once with the same once_flag object? :)

I've reported the pthread_once issue here, and added a note on the
other related items (recursive calls and pthread_exit) that came up
later:

http://austingroupbugs.net/view.php?id=863

> (Strictly spoken, once_flag is not required to be statically
> allocated, as is the explicit requirement for pthread_once_t. Yet
> another omission of C11, it is on my list for addition.)

In principle, it's nice for a call_once like function not to require
static storage. It means you can use it inside dynamically allocated
objects for ctor-like purposes where the ctor isn't always needed and
can be run lazily on the first operation that needs it. Of course
call_once is useless for this since it doesn't allow passing an
argument to the function. So I have no objection to requesting that
C11 be amended to require static storage for once_flag.

> > and I
> > don't think it's the intent of WG14 to require unwinding. The
> > fundamental problem is the lack of any way to synchronize "all
> > subsequent calls to the call_once function with the same value of
> > flag" when the code to update flag to indicate completion of the call
> > is skipped.
> 
> sure

So I don't see how you would solve this problem without just banning
use of longjmp from call_once.

> > Do you see any solution to this problem? Perhaps a conforming
> > implementation can just deadlock all subsequent calls, since (1)
> > calling the function more than once conflicts with the "exactly once"
> > requirement, and (2) the completion of the first call, which
> > subsequent calls have to synchronize with, never happens.
> 
> one part of solution is the "static" implementation as mentioned
> above,

I don't see how this fixes it. The cancellation buffer is not the
issue that matters. The issue is that what to do isn't even specified,
and unless the "what to do" is just deadlock, there's no good way to
detect that the once_flag object was 'released' by a longjmp to allow
it to be taken again.

> followed by a fight to get the requirement for static
> allocation and interdiction of longjmp and thrd_exit into the
> specification of call_once.

I don't think it will be much of a fight. BTW while you're at it,
there might be a number of additional issues -- like what happens if
atexit handlers call thrd_exit, what happens if tss dtors longjmp,
etc. As far as I can tell, none of this is properly specified and IMO
it should all be explicit UB.

> > - In the case of longjmp, it's already dangerous to longjmp across
> >   arbitrary code. Unless you know what code you're jumping over is
> >   doing or have a contract that it's valid to jump over it, you can't
> >   use longjmp.
> 
> sure, but just consider the nice tool-library that implements
> try-catch sort of things with setjmp/longjmp.

It's not possible to mix this with cancellation (or with C++
exceptions, for that matter). But as long as neither "unwinds" across
the other, you're fine. This is a restriction you need to follow and
be aware of anyway. It's not something we can "work around" except by
using the same underlying implementation for all three, and that's
undesirable for various reasons (one of which is that people should
not be relying on it).

Even if you did want to support this, making it work is a lot harder
than allowing longjmp out of pthread_once/call_once. The latter are
special cases where there's at most one cancellation cleanup layer and
it's implementation-internal. The general case has arbitrary
cancellation cleanup buffers at different call frame contexts, and
they're most certainly not static.

> > > For the case of pthread_once or call_once, we could get away with it
> > > by having the `struct __ptcb` static at that place. The caller of the
> > > callback holds a lock at the point of call, so there would be no race
> > > on such a static object.
> > > 
> > > I will prepare a patch for call_once to do so, just to be on the safe
> > > side. This would incur 0 cost.
> > 
> > The cost is encoding knowledge of the cancellation implementation at
> > the point of the call, and synchronizing all calls to call_once with
> > respect to each other, even when they don't use the same once_flag
> > object.
> 
> No, I don't think so, and with what I explained above this is perhaps
> a bit clearer. The only thing that this does is making all the state
> of a particular call to call_once statically allocated.

OK, I misunderstood your design.

> (There is even less interaction between calls to call_once with
> different objects, than we have for the current implementation of
> pthread_once. That one uses a `static int waiters` that is shared
> between all calls.)

Yes, that's an unfortunate issue which I should fix. But making the
once_flag a big structure rather than a single int (which will surely
disagree with the glibc ABI) is not the solution. The solution is just
to have a potential-waiters flag in the single int, rather than a
waiter count. The incidence of contention is so low that it doesn't
matter if it results in a rare excess syscall, especially since once
calls are once-per-program-invocation.

> > call_once does not have specified behavior if the callback is
> > cancelled (because, as far as C11 is concerned, cancellation does not
> > even exist). Thus there is no need for a cleanup function at all.
> 
> After digging into it I don't agree. In a mixed pthread/C thread
> environment cancellation will eventually happen, so we need a cleanup
> function to avoid deadlocks.

No it won't. Cancellation is unusable with almost all existing code
(and almost all new code). To use cancellation, the target thread must
be running code which was written to be aware of the possibility of
being cancelled. Thus, it's not going to "just happen".

I have an in-progress proposal for an alternate cancellation mode
where the first cancellation point called with cancellation pending
would not act on it, but instead fail with ECANCELED. Setting this
mode would allow calling arbitrary third-party library code from
threads that might be subject to cancellation, and it would actually
behave as desired if the callee properly checks for errors and passes
error status back up to the caller. Unfortunately there are enough
details that I haven't worked out (not in the implementation, but in
how some special cases should behave) that I haven't implemented it or
proposed it yet.

> > I think we should wait to do anything unneccessary, ugly, and/or
> > complex here unless the next issue of POSIX specifies a behavior for
> > this case.
> 
> too late, done already, and it is not too ugly, either

What's already done?

> > > The same could be used for pthread_once.
> > 
> > Above analysis about longjmp being impossible to support without
> > deadlock or unwinding in longjmp also applies to pthread_once. I
> > suspect, if longjmp is implicitly allowed, this is just an oversight
> > and one which will be corrected.
> 
> right
> 
> also introducing the same idea as above would be an ABI change, so
> let's not touch this

Indeed.

> > We're talking about a return statement in between the
> > pthread_cleanup_push and pthread_cleanup_pop calls in a particular
> > block context. Third-party library code does not magically insert such
> > a statement in your code. So I don't understand the case you have in
> > mind.
> 
> No it is not only return, it also is longjmp. As mentioned above,
> consider a pure C11 context where I use a tool box that implements
> try-catch things by means of setjmp/longjmp.

This is UB and it won't be supported. This decision was made a long
time ago.

> Then I use another, independent library that uses callbacks, that can
> be graphics, OpenMp, libc, whatever. And perhaps on some POSIX system
> it uses pthread_cleanup_push under the hood.
> 
> It will be hard to ensure that both tools when used together on an
> arbitrary platform that implements them will not collide with the
> requirement not to use longjmp across a pthread_cleanup context.

No it's not. Only if one is calling to the other which calls back to
the first, and they're mixing their unwinding mechanisms across these
boundaries, can any such issue happen. And this is explicitly UB and
explicitly nonsense.

> (BTW, in the musl timer code I just detected a nasty usage of longjmp
> from within a cleanup handler. That one is ugly :)

Yes, but this is implementation-internal and used correctly based on
the properties of the cleanup implementation. It's needed to reuse the
same thread (which is needed because we can't necessarily make a new
one.)

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.