musl - Re: My current understanding of cond var access restrictions

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140814061009.GA6599@brightrain.aerifal.cx>
Date: Thu, 14 Aug 2014 02:10:09 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: My current understanding of cond var access restrictions

On Thu, Aug 14, 2014 at 01:20:25AM +0200, Jens Gustedt wrote:
> > 4. When can signal and broadcast safely use the mutex?
> > 
> > Not at all, unless it can block waiters from exiting the wait. Any
> > waiter could spontaneously exit the wait as a result of cancellation,
> > timeout, or a cv signal from another thread, and by the above, it may
> > be entitled to destroy the mutex.
> 
> Are you suggesting that all waiters when coming back should first
> regain an internal lock on the cv?

I think I have an informal proof sketch that this is necessary unless
we abandon requeue:

If we want to be able to use the mutex in broadcast, which is needed
for requeue, then broadcast needs a lock that can block at least one
waiter from returning, and needs to confirm that at least one waiter
remains after the lock is obtained (otherwise it's easy -- there's no
work to do), so that the mutex is valid.

(Note: Broadcast can immediately release this lock if it determines
that the calling thread holds the mutex, since in this case, the mutex
will be sufficient to prevent any waiter from returning. But in
general it needs to hold the lock until requeue is performed.)

In order for this lock to block waiters from returning, any waiter
that woke possibly not under the control of broadcast/signal (i.e.
futex wait not returning 0) has to obtain the lock. (For safety
against application use of futexes that generates spurious wakes, it
might be best to just ignore the return value and always attempt to
get the lock.) This probably means it has to access the cv object
(unless it uses an object at another location whose address was
obtained before waiting), which in turn means that we have to track
references so that destroy can wait for all references to be released
before returning.

So I think we're stuck with something like the current implementation,
or abandoning requeue and just doing private cond vars the same as
process-shared ones. This is actually somewhat reassuring -- it means
I wasn't completely insane when I came up with the current
implementation a couple years back. Or at least, if I was, the
insane line of reasoning is at least reproducible. :-)

With that in mind, I'd like to look for ways we can fix the bogus
waiter accounting for the mutex that seems to be the source of the bug
you found. One "obvious" (but maybe bad/wrong?) solution would be to
put the count on the mutex at the time of waiting (rather than moving
it there as part of broadcast), so that decrementing the mutex waiter
count is always the right thing to do in unwait. Of course this
possibly results in lots of spurious futex wakes to the mutex (every
time it's unlocked while there are waiters on the cv, which could be a
lot). It would be nice if we had a separate field in the mutex (rather
than in the cv, as it is now) to store these on, and only move them to
the active waiters count at broadcast time, but I don't see any way to
get additional space in the mutex structure for this -- it's full.

> > 5. When can [timed]wait safely access the cv?
> > 
> > Only before unlocking the mutex, unless the implementation
> > synchronizes with possible signaling threads, or with destruction (and
> > possibly unmapping). Otherwise, per the above, it's possible that a
> > signaling thread destroys the cv.
> 
> so again this suggests an internal lock on the cv that would be used
> to synchronize between waiters and wakers?

This argument applies even to process-shared cv's, and for them, no
allocation is possible, and I don't see a really good way to solve the
unmapping issue -- I think broadcast/signal would have to block
unmapping, and the last waiter to wake up would have to unblock it.
Maybe that's the right solution?

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.