Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 20 Jun 2017 17:36:29 +0200
From: Jens Gustedt <jens.gustedt@...ia.fr>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] (V2) a new lock algorithm with lock value and CS
 counts in the same atomic int

Hello Alexander,

On Tue, 20 Jun 2017 17:41:35 +0300 (MSK) Alexander Monakov
<amonakov@...ras.ru> wrote:

> > +		/* We can only go into wait, if we know that
> > somebody holds the
> > +		   lock and will eventually wake us up, again. */
> > +		if (current < 0) {
> > +			__futexwait(l, current, 1);  
> 
> Here you reuse the value of 'current' after resuming after a
> futex-wait. But if the lock is under contention, with high
> probability other threads have modified the lock while we were
> sleeping, so the value of 'current' is likely stale, and the
> following a_cas is pretty much guaranteed to fail. Wouldn't it be
> better to reload current = *l after resuming?

Not to my experience. The benchs that you can find in the references
that I have given, clearly show that this strategy is better than the
current lock strategy (and a lot of variants that I tried.) These
benchmarks were performed on two architectures that are
representative, I think: x86 for one that locks the bus and arm for
one that uses a monitor. (And the gain on arm is more important.)

For all what I found, the real cost here is the bus transfer (caches
are probably not uptodate, anyhow) so doing a CAS compared to a load
is not much more expensive. On the other hand, if we guess "current"
correctly, we are done. Also this guess is not as bad as it might
appear. Comming out of the wait should usually be after a wakeup, were
just the current lock holder left the CS. So the educated guess in
that situation, that the lock is free and there is one thread less in
the CS, works quite well. And if it doesn't, and all other threads of
the CS are in kernel sleep, the next iteration does the trick.

So for the usual case you'd deal one forced load plus one CAS, against
a single CAS in the good case, and two CAS in the bad one. After
coming back from a kernel sleep, you have other things to worry
(reloading all your data, e.g).

Again, I'd like to emphasize that the performance of this part is not
very important. I wouldn't know a program could hammer as much on a
lock that uses this code inside musl, such that a lot of threads are
stuck in that congestion loop.

Thanks
Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.