musl - Re: Illegal killlock skipping when transitioning to single-threaded state

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2628dbba894de61664271a42d320090@ispras.ru>
Date: Tue, 04 Oct 2022 13:18:31 +0300
From: Alexey Izbyshev <izbyshev@...ras.ru>
To: musl@...ts.openwall.com
Cc: musl@...ts.openwall.com
Subject: Re: Illegal killlock skipping when transitioning to
 single-threaded state

On 2022-10-04 07:59, Rich Felker wrote:
> On Mon, Oct 03, 2022 at 11:00:21PM -0400, Rich Felker wrote:
>> On Mon, Oct 03, 2022 at 10:58:32PM -0400, Rich Felker wrote:
>> > On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote:
>> > > * Szabolcs Nagy <nsz@...t70.net> [2022-10-03 15:26:15 +0200]:
>> > >
>> > > > * Alexey Izbyshev <izbyshev@...ras.ru> [2022-10-03 09:16:03 +0300]:
>> > > > > On 2022-09-19 18:29, Rich Felker wrote:
>> > > > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote:
>> > > > ...
>> > > > > > > Reordering the "libc.need_locks = -1" assignment and
>> > > > > > > UNLOCK(E->killlock) and providing a store barrier between them
>> > > > > > > should fix the issue.
>> > > > > >
>> > > > > > I think this all sounds correct. I'm not sure what you mean by a store
>> > > > > > barrier between them, since all lock and unlock operations are already
>> > > > > > full barriers.
>> > > > > >
>> > > > >
>> > > > > Before sending the report I tried to infer the intended ordering semantics
>> > > > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't
>> > > > > see why they would provide a full barrier (my reasoning is below), so I
>> > > > > concluded that probably acquire/release semantics was intended in general
>> > > > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks
>> > > > > = -1" store spelled after UNLOCK(E->killlock) back into the critical
>> > > > > section.
>> > > > >
>> > > > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple
>> > > > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented
>> > > > > via load-acquire/store-release instructions. Therefore, if we consider a
>> > > > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any
>> > > > > such memory access can be reordered with the initial ldaxr in UNLOCK, and
>> > > > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming
>> > > > > the processor predicts that stlxr succeeds), and further, due to (a), with
>> > > > > any memory access inside the critical section. Therefore, UNLOCK is not full
>> > > > > barrier. Is this right?
>> > > >
>> > > > i dont think this is right.
>> > >
>> > >
>> > > i think i was wrong and you are right.
>> > >
>> > > so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and
>> > > starting with 'something == 0' the exiting E and remaining R threads:
>> > >
>> > > E:something=1      // protected by killlock
>> > > E:UNLOCK(killlock)
>> > > E:need_locks=-1
>> > >
>> > > R:LOCK(unrelated)  // reads need_locks == -1
>> > > R:need_locks=0
>> > > R:UNLOCK(unrelated)
>> > > R:LOCK(killlock)   // does not lock
>> > > R:read something   // can it be 0 ?
>> > >
>> > > and here something can be 0 (ie. not protected by killlock) on aarch64
>> > > because
>> > >
>> > > T1
>> > > 	something=1
>> > > 	ldaxr ... killlock
>> > > 	stlxr ... killlock
>> > > 	need_locks=-1
>> > >
>> > > T2
>> > > 	x=need_locks
>> > > 	ldaxr ... unrelated
>> > > 	stlxr ... unrelated
>> > > 	y=something
>> > >
>> > > can end with x==-1 and y==0.
>> > >
>> > > and to fix it, both a_fetch_add and a_cas need an a_barrier.
>> > >
>> > > i need to think how to support such lock usage on aarch64
>> > > without adding too many dmb.
>> >
>> > OK, after reading a lot more, I think I'm starting to get what you're
>> > saying. Am I correct in my understanding that the problem is that the
>> > "R:LOCK(unrelated)" as implemented does not synchronize with the
>> > "E:UNLOCK(killlock)" because they're different objects?
>> >
>> > If so, I think this would be fully solved by using __tl_sync in the
>> > code path that resets need_locks to 0 after observing -1, by virtue of
>> > providing a common object (the thread list lock) to synchronize on.
>> > This is the "weaker memory model friendly" approach we should probably
>> > strive to achieve some day.
>> >
>> > However, all existing code in musl is written assuming what I call the
>> > "POSIX memory model" where the only operation is "synchronizes memory"
>> > and that underspecified phrase has to be interpreted as "is a full
>> > barrier" to admit any consistent model. Said differently, the code was
>> > written assuming every a_* synchronizes with every other a_*, without
>> > any regard for whether they act on the same objects. This likely even
>> > matters for how our waiter accounting works (which is probably a good
>> > argument for removing it and switching to waiter flags). So I think,
>> > if the issue as I understand it now exists, we do need to fix it. Then
>> > we can revisit this at some later time as part of a big project.
>> 
>> Forgot, I should include links to the material I've been reading:
>> 
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697
>> https://gcc.gnu.org/git?p=gcc.git;a=commitdiff;h=f70fb3b635f9618c6d2ee3848ba836914f7951c2
>> https://gcc.gnu.org/git?p=gcc.git;a=commitdiff;h=ab876106eb689947cdd8203f8ecc6e8ac38bf5ba
>> 
>> which is where the GCC folks seem to have encountered and fixed their
>> corresponding issue.

Thank you for the links.

This comment[1] addresses my doubts on whether a store following (the 
current implementation of) UNLOCK can become visible before stlxr 
despite the conditional branch instruction in between (yes, it can).

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697#c23
> 
> Proposed patch attached.
> 
The patch looks good to me.

Alexey
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.