Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 28 Jul 2015 12:07:37 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: What's left for 1.1.11 release?

On Tue, Jul 28, 2015 at 05:15:29PM +0200, Jens Gustedt wrote:
> > > The operation for which you need acquire consistency, is in fact the
> > > load of l[1]. Somehow the current approach is ambiguous to which is
> > > the atomic object. Is it l[0], is it l[1] or is it the pair of them?
> > 
> > l[0] is the lock word. l[1] is the waiters count and while it's
> > modified atomically, the read is relaxed-order. Contrary to my
> > expectations, real-world x86 chips will actually reorder the read of
> > l[1] before the store to l[0], resulting in a failure-to-wake
> > deadlock.
> 
> ok, I understand the arch issue now.
> 
> But then, again, it seems that this failure-to-wake deadlock would be
> relevant to my stdatomic implementation.

Yes, I think it would.

> > > To be safe, I think this needs a full cmpxchg on the pair (l[0],
> > > l[1]), otherwise you can't know if the waiter count l[1] corresponds
> > > to the value just before the release of the lock.
> > 
> > No. As long as a_store behaves as a full barrier (including acquire
> > behavior) as it was intended to, you cannot read a value of l[1] older
> > than the value it had at the time of a_store, because there's a
> > globally consistent order between the a_inc and a_store.
> 
> Yes, you are talking of the intended behavior, but which you said
> isn't achieved. I was talking of one possible scenario to resolve that
> problem.
> 
>  - One possibility, that you talked about previously, is to introduce
>    an additional fence after the store to l[0] and so the read of l[1]
>    would be guaranteed to be no older than that.

Right.

>  - The other possibility is that you force the two values to be
>    exchanged in one atomic operation, since you have to do one full
>    read and one full write, anyhow. Here again you'd have several
>    possibilities. One would be to ensure that the atomic operation is
>    a cmpxchg on the pair.

The vast majority of archs do not have double-word CAS, so I don't
want to depend on it. In any case, I don't see how it would be any
faster than doing a swap/cas type operation on the lock word followed
by a read of the waiter count, which in turn might be slower than the
solution I proposed.

>    Another possibility would be to squeeze the lock bit and the wait
>    counter into a single int, and operate on the bit with some
>    fetch_and and fetch_or operations. But that would probably be much
>    more of a code change.

Yes, this is the "new normal-type mutex" thread which I think you
already saw and commented on. I suspect it's the preferred approach in
the long term, but I don't like using redesigns as bug fixes unless
there's something fundamentally wrong with the old design that makes
it impossible to fix. In this case there is no such issue. The lock
design is perfectly valid; x86 a_store is just buggy.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.