Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 30 Jul 2015 12:00:59 +0200
From: Jens Gustedt <jens.gustedt@...ia.fr>
To: musl@...ts.openwall.com
Subject: Re: New optimized normal-type mutex?

Am Donnerstag, den 30.07.2015, 12:36 +0300 schrieb Alexander Monakov:
> On Thu, 30 Jul 2015, Jens Gustedt wrote:
> 
> > Am Donnerstag, den 30.07.2015, 10:07 +0200 schrieb Jens Gustedt:
> > > Am Mittwoch, den 29.07.2015, 20:10 -0400 schrieb Rich Felker:
> > > > On Thu, Jul 30, 2015 at 01:49:20AM +0200, Jens Gustedt wrote:
> > > > > Hm, could you be more specific about where this hurts?
> > > > > 
> > > > > In the code I have there is
> > > > > 
> > > > >         for (;val & lockbit;) {
> > > > >           __syscall(SYS_futex, loc, FUTEX_WAIT, val, 0);
> > > > >           val = atomic_load_explicit(loc, memory_order_consume);
> > > > >         }
> > > > > 
> > > > > so this should be robust against spurious wakeups, no?
> > > > 
> > > > The problem is that futex_wait returns immediately with EAGAIN if
> > > > *loc!=val, which happens very often if *loc is incremented or
> > > > otherwise changed on each arriving waiter.
> > > 
> > > Yes, sure, it may change. Whether or not this is often may depend, I
> > > don't think we can easily make a quantitative statement, here.
> > > 
> > > In the case of atomics the critical section is extremely short, and
> > > the count, if it varies so much, should have a bit stabilized during
> > > the spinlock phase before coming to the futex part. That futex part is
> > > really only a last resort for the rare case that the thread that is
> > > holding the lock has been descheduled in the middle.
> > > 
> > > My current test case is having X threads hammer on one single
> > > location, X being up to some hundred. On my 2x2 hyperthreaded CPU for
> > > a reasonable number of threads (X = 16 or 32) I have an overall
> > > performance improvement of 30%, say, when using my version of the lock
> > > instead of the original musl one. The point of inversion where the
> > > original musl lock is better is at about 200 threads.
> > > 
> > > I'll see how I can get hold on occurrence statistics of the different
> > > phases without being too intrusive (which would change the
> > > scheduling).
> > 
> > So I tested briefly varying the number of threads from 2 up to 2048.
> > 
> > Out of the loop iterations on the slow path, less than 0.1 % try to go
> > into futex wait, and out of these about 20 % come back with EGAIN.
> 
> That sounds like your testcase simulates a load where you'd be better off with
> a spinlock in the first place, no?

Hm, this is not a "testcase" in the sense that this is the real code
that I'd like to use for the generic atomic lock-full stuff. My test
is just using this atomic lock-full thing, with a lot of threads that
use the same head of a "lock-free" FIFO implementation. There the
inner part in the critical section is just memcpy of some bytes. For
reasonable uses of atomics this should be about 16 to 32 bytes that
are copied.

So this is really a use case that I consider important, and that I
would like to see implemented with similar performance.

(I didn't yet think of making this into a fullfledged mutex,
implementing timed versions certainly needs some thinking.)

> Have you tried simulating a load that does some non-trivial work between
> lock/unlock, making a spinlock a poor fit?

No. But I am not sure that there is such a case :)

With this idea that the counter doesn't change once the thread is
inside the lock-acquisition loop, there is much less noise on the lock
value. This has two benefits. First the accesses in the loop are
mainly reads, to see if there has been a change, no writes. So the bus
pressure should be reduced. And second, because there are less writes
in total, other threads that are inside the same loop perceive less
perturbation, and the futex as a good chance to succeed.

Jens


-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::





Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.