musl - Re: [PATCH] a new lock algorithm with lock value and CS counts in the same atomic int

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170618202009.GX1627@brightrain.aerifal.cx>
Date: Sun, 18 Jun 2017 16:20:09 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] a new lock algorithm with lock value and CS
 counts in the same atomic int

On Sun, Jun 18, 2017 at 09:32:09PM +0200, Jens Gustedt wrote:
> Hello Rich,
> 
> On Sun, 18 Jun 2017 12:04:59 -0400 Rich Felker <dalias@...c.org> wrote:
> 
> > > > Is there a reason __wait doesn't work?  
> > > 
> > > __wait doesn't fit here at all, for instance it manipulates a
> > > separate int with waiters count.  
> > 
> > It accepts a null pointer for the waiters count, so that it can be
> > used in contexts with or without one. Any cost of the additional
> > conditional branches is dominated by syscall time.
> 
> Looking into it, I don't agree with you, Rich. Even with waiters set
> to 0 it would to 100 spins before going into the syscall. This is much
> of a waste, here, because we are just comming out of a spin (at the
> first iteration) or we did spin around as long as the value was
> positive.

I haven't reviewed that logic yet, so perhaps that's what I'm missing.
I'll follow up with more after I understand that code better.

> I don't see why the cost of 100 spins would be dominated by the
> syscall. If I remember correctly, the benchmarks that I made showed
> about 10 memory operations for an unsuccessful syscall. This is why
> the magic number for the initial spin is set to 10.

A syscall takes at least 500 cycles on an extremely fast system, and
more like 1500-2000 on many. 100 spins was determined empirically and
is probably near the break-even point. If it's a bad choice, it's
probably also bad for other users of __wait.

> It might be benefitial to do a_spin for a while, if we know that CS
> that are protected by this lock are really short, just some
> cycles. But 100 is a far too big number, and in my benchmarks I found
> not much indication of a benefit for it.
> 
> If we want code sharing with the rest of musl (which we should) I like
> Alexander's idea of a __futexwait inline function much better.

I don't think there's any value to making it inline. If it could be a
single syscall, that would be one thing, but with the fallback for old
systems that lack private futex, it's just a gratuitously large inline
chunk that's likely to interfere with inlining/optimization of the
caller, and certainly has no potential to improve performance (since
there's a syscall involved).

The original intent was that __wait be the "__futexwait" primitive
Alexander had in mind. If it's too heavy as it is now, perhaps that's
a mistake that affects other usage too. I was never very happy with
the shaky theoretical foundations of the spinning, but I did feel like
I at least got it to the point where it didn't have pathologically bad
cases.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.