musl - Re: Illegal killlock skipping when transitioning to single-threaded state

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd7c341816819554877e452643d1f698@ispras.ru>
Date: Tue, 04 Oct 2022 08:16:16 +0300
From: Alexey Izbyshev <izbyshev@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: Illegal killlock skipping when transitioning to
 single-threaded state

On 2022-10-04 00:27, Szabolcs Nagy wrote:
> * Szabolcs Nagy <nsz@...t70.net> [2022-10-03 15:26:15 +0200]:
> 
>> * Alexey Izbyshev <izbyshev@...ras.ru> [2022-10-03 09:16:03 +0300]:
>> > On 2022-09-19 18:29, Rich Felker wrote:
>> > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote:
>> ...
>> > > > Reordering the "libc.need_locks = -1" assignment and
>> > > > UNLOCK(E->killlock) and providing a store barrier between them
>> > > > should fix the issue.
>> > >
>> > > I think this all sounds correct. I'm not sure what you mean by a store
>> > > barrier between them, since all lock and unlock operations are already
>> > > full barriers.
>> > >
>> >
>> > Before sending the report I tried to infer the intended ordering semantics
>> > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't
>> > see why they would provide a full barrier (my reasoning is below), so I
>> > concluded that probably acquire/release semantics was intended in general
>> > and suggested an extra store barrier to prevent hoisting of "libc.need_locks
>> > = -1" store spelled after UNLOCK(E->killlock) back into the critical
>> > section.
>> >
>> > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple
>> > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented
>> > via load-acquire/store-release instructions. Therefore, if we consider a
>> > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any
>> > such memory access can be reordered with the initial ldaxr in UNLOCK, and
>> > (b) any plain load following UNLOCK can be reordered with stlxr (assuming
>> > the processor predicts that stlxr succeeds), and further, due to (a), with
>> > any memory access inside the critical section. Therefore, UNLOCK is not full
>> > barrier. Is this right?
>> 
>> i dont think this is right.
> 
> 
> i think i was wrong and you are right.
> 
> so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and
> starting with 'something == 0' the exiting E and remaining R threads:
> 
> E:something=1      // protected by killlock
> E:UNLOCK(killlock)
> E:need_locks=-1
> 
> R:LOCK(unrelated)  // reads need_locks == -1
> R:need_locks=0
> R:UNLOCK(unrelated)
> R:LOCK(killlock)   // does not lock
> R:read something   // can it be 0 ?
> 
> and here something can be 0 (ie. not protected by killlock) on aarch64
> because
> 
> T1
> 	something=1
> 	ldaxr ... killlock
> 	stlxr ... killlock
> 	need_locks=-1
> 
> T2
> 	x=need_locks
> 	ldaxr ... unrelated
> 	stlxr ... unrelated
> 	y=something
> 
> can end with x==-1 and y==0.
> 
Yes, overall this matches my understanding. However, your UNLOCK 
expansion (in T1/T2) omits the branch instruction between stlxr and the 
following store, and, as I mentioned, I'm not sufficiently knowledgeable 
to understand the effects of this branch on the order of visibility of 
"stlxr killlock" (and preceding stores) and "need_locks=-1".

> and to fix it, both a_fetch_add and a_cas need an a_barrier.
> 
> i need to think how to support such lock usage on aarch64
> without adding too many dmb.
> 

I'd also like to point out that our discussion so far has been focused 
on reordering of "something" protected by the critical section and 
"need_locks=-1", but my original bug report also mentioned the 
possibility of lock corruption, and for the latter we're also interested 
in reordering of "stlxr" and "need_locks=-1". It's possible that some 
UNLOCK implementations can provide the first guarantee, but not the 
second.

> 
> 
>> 
>> memory operations in the critical section cannot visibly happen after 
>> the
>> final stlxr.
>> 
>> memory operations after the critical section cannot visibly happen 
>> before
>> the ldaxr of UNLOCK.
>> 
>> the only issue can be inside the ll/sc loop in UNLOCK if there are 
>> independent
>> memory operations there, but there arent.
>> 
You've already said this is wrong, but I want to spell out why it's so 
for any interested parties to hopefully clear some confusion. Both 
premises (about stlxr and ldaxr) are correct, but the third point 
doesn't follow and is wrong. Because ldaxr/stlxr act as one-way 
barriers, the following pseudo-code

r1 = X
ldaxr LOCK
stlxr LOCK
r2 = Y

can be observed, in my understanding of AArch64 memory model, as

ldaxr LOCK
r1 = X
r2 = Y
stlxr LOCK

and even as

ldaxr LOCK
r2 = Y
r1 = X
stlxr LOCK

The same is true for stores as well.

So effectively the loop in UNLOCK can absorb memory operations from both 
directions, which then can be reordered with each other (with a possible 
exception of stores that follow UNLOCK, as I said above).

Thanks,
Alexey
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.