Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 3 Oct 2022 19:05:06 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com, Alexey Izbyshev <izbyshev@...ras.ru>
Subject: Re: Illegal killlock skipping when transitioning to
 single-threaded state

On Mon, Oct 03, 2022 at 06:54:17PM -0400, Rich Felker wrote:
> On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote:
> > * Szabolcs Nagy <nsz@...t70.net> [2022-10-03 15:26:15 +0200]:
> > 
> > > * Alexey Izbyshev <izbyshev@...ras.ru> [2022-10-03 09:16:03 +0300]:
> > > > On 2022-09-19 18:29, Rich Felker wrote:
> > > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote:
> > > ...
> > > > > > Reordering the "libc.need_locks = -1" assignment and
> > > > > > UNLOCK(E->killlock) and providing a store barrier between them
> > > > > > should fix the issue.
> > > > > 
> > > > > I think this all sounds correct. I'm not sure what you mean by a store
> > > > > barrier between them, since all lock and unlock operations are already
> > > > > full barriers.
> > > > > 
> > > > 
> > > > Before sending the report I tried to infer the intended ordering semantics
> > > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't
> > > > see why they would provide a full barrier (my reasoning is below), so I
> > > > concluded that probably acquire/release semantics was intended in general
> > > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks
> > > > = -1" store spelled after UNLOCK(E->killlock) back into the critical
> > > > section.
> > > > 
> > > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple
> > > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented
> > > > via load-acquire/store-release instructions. Therefore, if we consider a
> > > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any
> > > > such memory access can be reordered with the initial ldaxr in UNLOCK, and
> > > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming
> > > > the processor predicts that stlxr succeeds), and further, due to (a), with
> > > > any memory access inside the critical section. Therefore, UNLOCK is not full
> > > > barrier. Is this right?
> > > 
> > > i dont think this is right.
> > 
> > 
> > i think i was wrong and you are right.
> > 
> > so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and
> > starting with 'something == 0' the exiting E and remaining R threads:
> > 
> > E:something=1      // protected by killlock
> > E:UNLOCK(killlock)
> > E:need_locks=-1
> > 
> > R:LOCK(unrelated)  // reads need_locks == -1
> > R:need_locks=0
> > R:UNLOCK(unrelated)
> > R:LOCK(killlock)   // does not lock
> > R:read something   // can it be 0 ?
> > 
> > and here something can be 0 (ie. not protected by killlock) on aarch64
> > because
> > 
> > T1
> > 	something=1
> > 	ldaxr ... killlock
> > 	stlxr ... killlock
> > 	need_locks=-1
> > 
> > T2
> > 	x=need_locks
> > 	ldaxr ... unrelated
> > 	stlxr ... unrelated
> > 	y=something
> > 
> > can end with x==-1 and y==0.
> > 
> > and to fix it, both a_fetch_add and a_cas need an a_barrier.
> > 
> > i need to think how to support such lock usage on aarch64
> > without adding too many dmb.
> 
> I don't really understand this, but FWIW gcc emits 
> 
>     ldxr
>     ...
>     stlxr
>     ...
>     dmb ish
> 
> for __sync_val_compare_and_swap. So this is probably the right thing
> we should have. And it seems to match what the kernel folks discussed
> here:
> 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229588.html
> 
> I wondered if there are similar issues for any others archs which need
> review, but it looks like all the other llsc archs have explicit
> pre/post barriers defined.

Actually I don't understand what's going on with cmpxchg there. The
patch I linked has it using ldxr/stxr (not stlxr) for cmpxchg. There's
some follow-up in the thread I don't understand, about the case where
the cas fails, but we already handle that by doing an explicit barrier
in that case.

This stuff is a poorly-documented mess. >_<

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.