Date: Wed, 13 Aug 2014 08:34:16 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: bug in pthread_cond_broadcast On Wed, Aug 13, 2014 at 09:00:56AM +0200, Jens Gustedt wrote: > Am Dienstag, den 12.08.2014, 20:30 -0400 schrieb Rich Felker: > > On Wed, Aug 13, 2014 at 12:50:19AM +0200, Jens Gustedt wrote: > > > The signalling or broacasting thread (waker) should do most of the > > > bookkeeping on the waiters counts. This might be done by > > > > > > - lock _c_lock > > > > > > - if there are no waiters, unlock _c_lock and quit > > > > > > - requeue the wanted number of threads (1 or everybody) from the cnd > > > to the mtx. requeue tells us how many threads have been requeued, > > > and this lets us deduce the number of threads that have been woken > > > up. > > > > If you requeue here, where does any wake happen? > > > > > - verify that all wanted waiters are in, otherwise repeat the requeue > > > operation. (this should be a rare event) > > > > This step is not possible. One or more waiters could be in signal > > handlers which interrupted the wait, > > yes, but only one waiter at the time can be in the initial phase of > the wait, waiters always hold the mutex in question. So the waiters > you are talking about are basically the ones that already released the > mutex and are going into the futex-wait. There should be no signal > handler waiting for an event coming from such a thread. Signal handler means in the sense of signal.h. The only way to guarantee this would be to block signals during this interval, but there's no way to atomically unblock them before going into the futex wait, where they need to be unblocked, since the wait could last arbitrarily long. Anyway the likely case is that the signal arrives _while_ in the futex wait and thereby causes the wait to be interrupted and restarted later. Technically there is unbounded time between the interruption and restart, but it's reasonable for one thread that's stuck in a signal handler that's interrupted a non-AS-safe function to block forward progress in other threads, so on further consideration I don't think your retry-loop idea is invalid. > So basically you can assume that waiters have done their part of the > bookkeeping when you are in that situation. It would be possible to ensure that they have finished all their bookkeeping (although mildly expensive, via syscalls to block signals) but it's not possible to ensure that they are actually in the futex wait syscall and able to receive requeues or wakes. BTW I'm not sure what happens when a signal interrupts a wait that's been requeued. It could be one of three things: - Restarting the wait on the original futex address, which the application would necessarily have to arrange to contain a new value so that it fails with EAGAIN. - Restarting the wait on the requeued address via poking at syscall argument values or use of a "restart block" containing the state for the interrupted syscall. - EINTR and letting the application handle it. Which one of these happens seems like it could make a big difference to what usage patterns are valid, and I fear the behavior may differ between kernel versions... Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.