musl - Re: New optimized normal-type mutex?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1438247427.10742.13.camel@inria.fr>
Date: Thu, 30 Jul 2015 11:10:27 +0200
From: Jens Gustedt <jens.gustedt@...ia.fr>
To: musl@...ts.openwall.com
Subject: Re: New optimized normal-type mutex?

Am Donnerstag, den 30.07.2015, 10:07 +0200 schrieb Jens Gustedt:
> Am Mittwoch, den 29.07.2015, 20:10 -0400 schrieb Rich Felker:
> > On Thu, Jul 30, 2015 at 01:49:20AM +0200, Jens Gustedt wrote:
> > > Hm, could you be more specific about where this hurts?
> > > 
> > > In the code I have there is
> > > 
> > >         for (;val & lockbit;) {
> > >           __syscall(SYS_futex, loc, FUTEX_WAIT, val, 0);
> > >           val = atomic_load_explicit(loc, memory_order_consume);
> > >         }
> > > 
> > > so this should be robust against spurious wakeups, no?
> > 
> > The problem is that futex_wait returns immediately with EAGAIN if
> > *loc!=val, which happens very often if *loc is incremented or
> > otherwise changed on each arriving waiter.
> 
> Yes, sure, it may change. Whether or not this is often may depend, I
> don't think we can easily make a quantitative statement, here.
> 
> In the case of atomics the critical section is extremely short, and
> the count, if it varies so much, should have a bit stabilized during
> the spinlock phase before coming to the futex part. That futex part is
> really only a last resort for the rare case that the thread that is
> holding the lock has been descheduled in the middle.
> 
> My current test case is having X threads hammer on one single
> location, X being up to some hundred. On my 2x2 hyperthreaded CPU for
> a reasonable number of threads (X = 16 or 32) I have an overall
> performance improvement of 30%, say, when using my version of the lock
> instead of the original musl one. The point of inversion where the
> original musl lock is better is at about 200 threads.
> 
> I'll see how I can get hold on occurrence statistics of the different
> phases without being too intrusive (which would change the
> scheduling).

So I tested briefly varying the number of threads from 2 up to 2048.

Out of the loop iterations on the slow path, less than 0.1 % try to go
into futex wait, and out of these about 20 % come back with EGAIN.

In particular the figure of only 20-30 % of the futex calls failing
with EAGAIN, is quite stable.

For me these figures show that the futex phase is really neglectable
for performance and only serves as a last resort that protects us from
an attack.

Jens


-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.