Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 12 Aug 2014 20:30:11 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: bug in pthread_cond_broadcast

On Wed, Aug 13, 2014 at 12:50:19AM +0200, Jens Gustedt wrote:
> > I'd like to find a fix that
> > would be acceptable in the 1.0.x branch and make that fix before
> > possibly re-doing the cond var implementation (a fix which wouldn't be
> > suitable for backporting).
> 
> Some thoughts:
> 
> Basically, in "unwait" there shouldn't be any reference to c-> .  No
> pending thread inside timedwait should ever have to access the
> pthread_cond_t, again, it might already heavily used by other threads.

As far as I can see, there must be: since "unwait" potentially
releases the association of the mutex with the cond var, "unwait" and
broadcast need to mutually exclude one another, so that broadcast can
know whether there are zero waiters (in which case the mutex can
legally be destroyed by the last waiter, and broadcast cannot access
it) or at least one waiter that cannot re-acquire the mutex until the
broadcast is finished.

The only way I can see around this "must" is to do away with requeue
entirely and have broadcast wake all waiters, never inspecting the
mutex at all. This is certainly a lot simpler (it's what we do for
process-shared cond vars anyway) but performance is much worse.

> The signalling or broacasting thread (waker) should do most of the
> bookkeeping on the waiters counts. This might be done by
> 
>  - lock _c_lock
> 
>  - if there are no waiters, unlock _c_lock and quit
> 
>  - requeue the wanted number of threads (1 or everybody) from the cnd
>    to the mtx. requeue tells us how many threads have been requeued,
>    and this lets us deduce the number of threads that have been woken
>    up.

If you requeue here, where does any wake happen?

>  - verify that all wanted waiters are in, otherwise repeat the requeue
>    operation. (this should be a rare event)

This step is not possible. One or more waiters could be in signal
handlers which interrupted the wait, in which case the futex wait will
not resume until the signal handler returns. Such a retry loop could
run forever (e.g. if the signal handler is waiting for an event that
will only be performed by the [cond-var-]signaling thread after the
operation finishes).

>  - do the bookkeeping: update the cond-waiters count and add the right
>    amount to the mtx-waiters
> 
>  - unlock _c_lock
> 
> On the waiter side, you'd have to distinguish a successful wakeup by a
> waker from a spurious wakeup. Only for the later the waiter has to do
> the bookkeeping. This can only happen as long as the waker is in the
> "requeue" loop.

I don't understand what you mean.

> The only disadvantage that I see with such a procedure is that the
> waker is holding _c_lock when going into the futex call. 

This is probably a small issue compared to everything else.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.