musl - Re: [PATCH] a new lock algorithm with lock value and CS counts in the same atomic int

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170618150132.GV1627@brightrain.aerifal.cx>
Date: Sun, 18 Jun 2017 11:01:32 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] a new lock algorithm with lock value and CS
 counts in the same atomic int

On Fri, Jun 16, 2017 at 09:11:35AM +0200, Jens Gustedt wrote:
> A variant of this new lock algorithm has been presented at SAC'16, see
> https://hal.inria.fr/hal-01304108. A full version of that paper is
> available at https://hal.inria.fr/hal-01236734.
> 
> The main motivation of this is to improve on the safety of the basic lock
> implementation in musl. This is achieved by squeezing lock value and
> "waiter" count into a single int. Thereby an unlock operation does
> exactly one memory transfer (a_fetch_add) and never touches the value
> again, but still detects if a waiter has to be woken up.

Thanks! In the process of reviewing.

> Another effect could be improved performance, but don't take this too
> seriously, these lock/unlock functions are probably never used in
> performance critical parts of libc.

An equivalent inlined/open-coded version of the lock is used in
malloc, which is the only place I see where it's performance-critical.
If it really does perform significantly better it would make sense to
see if we can get the benefit in malloc too; that may be harder
because it would sacrifice the inlining unless we wanted to try to get
the larger new code inlined (which the compiler may not want to do, or
which may affect other levels of inlining in the same file).

> Depending on the architecture, the performance of the previous lock
> strategy was not always so great under no or medium congestion. Now, in
> case of no congestion, there are exactly two memory operations, one to
> lock an one to unlock. So if we would start to use this same new strategy
> also for user locks such as mutexes, semaphores or the locks used for
> lockfull atomics, we might see a performance improvement.

It definitely can't be used for mutexes that have to record an owner
(recursive, error-checking, and/or robust), or for semaphores, unless
we wanted to depend on double-word atomics. There's just too much
state in all these cases to fit in 32 bits. Normal mutexes could do it
though. Another case that would be very nice to optimize, but where I
don't see a way, is stdio FILE locks. They're recursive and thus need
to record an owner. The current implementation works more like an
errorchecking mutex abused as a recursive one (see also: an old SO
question on that topic), having the caller be responsible for knowing
whether it needs to unlock from whether it was already owned by the
calling thread at lock time.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.