musl - Re: cpuset/affinity interfaces and TSX lock elision in musl

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130517172902.GC20323@brightrain.aerifal.cx>
Date: Fri, 17 May 2013 13:29:03 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: cpuset/affinity interfaces and TSX lock elision in musl

On Fri, May 17, 2013 at 01:28:02PM +0200, Szabolcs Nagy wrote:
> * Daniel Cegie?ka <daniel.cegielka@...il.com> [2013-05-17 09:41:18 +0200]:
> > >> 2) The upcoming glibc will have support for TSX lock elision.
> > >>
> > >> http://en.wikipedia.org/wiki/Transactional_Synchronization_Extensions
> > >>
> > >> http://lwn.net/Articles/534761/
> > >>
> > >> Are there any outlook that we can support TSX lock elision in musl?
> > >
> > > I was involved in the discussions about lock elision on the glibc
> > > mailing list, and from what I could gather, it's a pain to implement
> > > and whether it brings you any benefit is questionable.
> > 
> > There is currently no hardware support, so the tests were done in the
> > emulator. It's too early to say there's is no performance gain.

I agree it's too early. That's why I said I'd like to wait and see
before doing anything. My view is that what glibc is doing is (1) an
experiment to see if it's worthwhile, and (2) a buzzword-compliance
gimmick whereby Linux vendors and Intel can show off that they have a
state-of-the-art new feature (regardless of whether it's useful).

> it's not the lock performance that's questionable
> but the benefits

Yes. An artificial benchmark to spam lock requests would not be that
interesting, and for real-world usage, it's a lot more questionable
whether lock elision would help or hurt. The canonical case where it
would hurt is:

1. Take lock
2. Do expensive computation
3. Output results via syscall
4. Release lock

In this case, the expensive computation gets performed twice. It may
be possible to avoid all of the costly cases by adaptively turning off
elision for particular locks (or of course by having the application
manually tune it, but that's hideous), with corresponding complexity
costs. Unless the _gains_ in the good cases are sufficiently
beneficial, however, I think that complexity would be misspent.

In some sense, perhaps a better place for lock elision would be at the
_compiler_ level. If the compiler could analyze the code and determine
that there is an unconditional path from the lock to the corresponding
unlock with no intervening external calls (think: adding or removing
item from a linked list), it could add a code path that uses lock
elision rather than locking. However this seems to require intricate
cooperation between the compiler and library implementation, which is
unacceptable to me...

> locks should not be the bottleneck in applications
> unless there is too much shared state on hot paths,
> which is probably a design bug or a special use-case
> for which non-standard synchronization methods may
> be better anyway

One place where there is unfortunately a huge amount of shared state
is memory management; this is inevitable. Even if we don't use lock
elision for pthread locks, it might be worth considering using it
_internally_ in malloc when it's available. It's hard to say without
any measurements, but this might result in a malloc that beats
ptmalloc, etc. without any thread-locale management.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.