musl - Re: [PATCH 1/2] let them spin

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150830003905.GH7833@brightrain.aerifal.cx>
Date: Sat, 29 Aug 2015 20:39:05 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 1/2] let them spin

On Sat, Aug 29, 2015 at 09:38:30PM +0200, Jens Gustedt wrote:
> Am Samstag, den 29.08.2015, 13:16 -0400 schrieb Rich Felker:
> > On Sat, Aug 29, 2015 at 10:50:44AM +0200, Jens Gustedt wrote:
> > > Remove a test in __wait that looked if other threads already attempted to
> > > go to sleep in futex_wait.
> > > 
> > > This has no impact on the fast path. But other than one might think at a
> > > first glance, this slows down if there is congestion.
> > > 
> > > Applying this patch shows no difference in behavior in a mono-core
> > > setting, so it seems that this shortcut is just superfluous.
> > 
> > The purpose of this code is is twofold: improving fairness of the lock
> > and avoiding burning cpu time that's _known_ to be a waste.
> > 
> > If you spin on a lock that already has waiters, the thread that spins
> > has a much better chance to get the lock than any of the existing
> > waiters which are in futex_wait. Assuming sufficiently many cores that
> > all threads that are not sleeping don't get preempted, the spinning
> > thread is basically guaranteed to get the lock unless it spins so long
> > to make it futex_wait. This is simply because returning from
> > futex_wake (which all the other waiters have to do) takes a lot more
> > time than one spin. I suspect there are common loads under which many
> > of the waiters will NEVER get the lock.
> 
> Yes and no. I benched the things to know a bit more. On my machine one
> loop in a spin lock is just about 10 times faster than a failed call
> to futex_wait.

So this means that a non-preempted spinning thread will always get the
lock before a thread returning from futex_wait, as I predicted.

> Even for the current strategy, one of the futex_waiting threads gets
> woken up and gets his chance with a new spinning phase.

In the current musl code, threads that futex_wait do not go back to
spinning except in a rare race. __wait keeps repeating the futex_wait
syscall as long as *addr==val. Repeating the spinning would let them
be more aggressive about getting the lock against other threads
contending for it, but it burns a lot of cpu and gives up all the
fairness that you would otherwise get from the futex queue.

> So the difference isn't dramatic, just one order of magnitude and
> everybody gets his chance. These chances are not equal, sure, but
> NEVER in capitals is certainly a big word.

Try this: on a machine with at least 3 physical cores, 3 threads
hammer on the same lock, counting the number of times they succeed in
taking it. Once any one thread has taken it at least 10 million times
or so, stop and print the counts. With your spin strategy I would
expect to see 2 threads with counts near 10 million and one thread
with a count in the hundreds or less, maybe even a single-digit count.
With the current behavior (never spinning if there's a waiter) I would
expect all 3 counts to be similar.

> On the other hands the difference in throughput on the multi-core
> setting between the different spin versions is dramatic for malloc, I
> find.

The problem we're dealing with is pathological bad cases, not
throughput in the lucky cases.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.