musl - Re: My current understanding of cond var access restrictions

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140814153615.GB12888@brightrain.aerifal.cx>
Date: Thu, 14 Aug 2014 11:36:15 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: My current understanding of cond var access restrictions

On Thu, Aug 14, 2014 at 10:41:10AM -0400, Rich Felker wrote:
> On Thu, Aug 14, 2014 at 10:00:04AM +0200, Jens Gustedt wrote:
> > Am Donnerstag, den 14.08.2014, 02:10 -0400 schrieb Rich Felker:
> > > I think I have an informal proof sketch that this is necessary unless
> > > we abandon requeue:
> > 
> > > ...
> > 
> > > With that in mind, I'd like to look for ways we can fix the bogus
> > > waiter accounting for the mutex that seems to be the source of the bug
> > > you found. One "obvious" (but maybe bad/wrong?) solution would be to
> > > put the count on the mutex at the time of waiting (rather than moving
> > > it there as part of broadcast), so that decrementing the mutex waiter
> > > count is always the right thing to do in unwait.
> > 
> > sounds like a good idea, at least for correctness
> > 
> > > Of course this
> > > possibly results in lots of spurious futex wakes to the mutex (every
> > > time it's unlocked while there are waiters on the cv, which could be a
> > > lot).
> > 
> > I we'd be more careful in not spreading too much wakes where we
> > shouldn't, there would perhaps not be "a lot" of such wakeups.
> 
> Well this is different from the wake-after-release that you dislike.
> It's a wake on a necessarily-valid object that just doesn't have any
> actual waiters right now because its potential-waiters are still
> waiting on the cv.
> 
> However I think it may be costly (one syscall per unlock) in
> applications where mutex is used to protect state that's frequently
> modified but where the predicate associated with the cv only rarely
> changes (and thus signaling is rare and cv waiters wait around a long
> time). In what's arguably the common case (a reasonable number of
> waiters as opposed to thousands of waiters on a 4-core box) just
> waking all waiters on broadcast would be a lot less expensive.
> 
> Thus I'm skeptical of trying an approach like this when it would be
> easier, and likely less costly on the common usage cases, just to
> remove requeue and always use broadcast wakes. I modified your test
> case for the bug to use a process-shared cv (using broadcast wake),
> and as expected, the test runs with no failure.

A really ugly hack that might solve the problem: adaptively switching
to a less efficient mode the first time a different mutex is used. It
could either switch to pre-moving wait counts to the mutex, or revert
to broadcast wakes.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.