musl - Re: New optimized normal-type mutex?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LNX.2.20.1507301228230.11825@monopod.intra.ispras.ru>
Date: Thu, 30 Jul 2015 12:36:19 +0300 (MSK)
From: Alexander Monakov <amonakov@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: New optimized normal-type mutex?

On Thu, 30 Jul 2015, Jens Gustedt wrote:

> Am Donnerstag, den 30.07.2015, 10:07 +0200 schrieb Jens Gustedt:
> > Am Mittwoch, den 29.07.2015, 20:10 -0400 schrieb Rich Felker:
> > > On Thu, Jul 30, 2015 at 01:49:20AM +0200, Jens Gustedt wrote:
> > > > Hm, could you be more specific about where this hurts?
> > > > 
> > > > In the code I have there is
> > > > 
> > > >         for (;val & lockbit;) {
> > > >           __syscall(SYS_futex, loc, FUTEX_WAIT, val, 0);
> > > >           val = atomic_load_explicit(loc, memory_order_consume);
> > > >         }
> > > > 
> > > > so this should be robust against spurious wakeups, no?
> > > 
> > > The problem is that futex_wait returns immediately with EAGAIN if
> > > *loc!=val, which happens very often if *loc is incremented or
> > > otherwise changed on each arriving waiter.
> > 
> > Yes, sure, it may change. Whether or not this is often may depend, I
> > don't think we can easily make a quantitative statement, here.
> > 
> > In the case of atomics the critical section is extremely short, and
> > the count, if it varies so much, should have a bit stabilized during
> > the spinlock phase before coming to the futex part. That futex part is
> > really only a last resort for the rare case that the thread that is
> > holding the lock has been descheduled in the middle.
> > 
> > My current test case is having X threads hammer on one single
> > location, X being up to some hundred. On my 2x2 hyperthreaded CPU for
> > a reasonable number of threads (X = 16 or 32) I have an overall
> > performance improvement of 30%, say, when using my version of the lock
> > instead of the original musl one. The point of inversion where the
> > original musl lock is better is at about 200 threads.
> > 
> > I'll see how I can get hold on occurrence statistics of the different
> > phases without being too intrusive (which would change the
> > scheduling).
> 
> So I tested briefly varying the number of threads from 2 up to 2048.
> 
> Out of the loop iterations on the slow path, less than 0.1 % try to go
> into futex wait, and out of these about 20 % come back with EGAIN.

That sounds like your testcase simulates a load where you'd be better off with
a spinlock in the first place, no?

Have you tried simulating a load that does some non-trivial work between
lock/unlock, making a spinlock a poor fit?

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.