Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 20 Jan 2016 20:22:29 -0500
From: Rich Felker <dalias@...c.org>
To: Oleg Endo <oleg.endo@...nline.de>
Cc: gcc@....gnu.org, musl@...ts.openwall.com
Subject: Re: SH runtime switchable atomics - proposed design

On Thu, Jan 21, 2016 at 08:08:18AM +0900, Oleg Endo wrote:
> On Tue, 2016-01-19 at 15:28 -0500, Rich Felker wrote:
> > I've been working on the new version of runtime-selected SH atomics
> > for musl, and I think what I've got might be appropriate for GCC's
> > generated atomics too. I know Oleg was not very excited about doing
> > this on the gcc side from a cost/benefit perspective
> 
> I am just not keen on making this the default atomic model for SH.
> If you have a system built around this atomic model and want to add it
> to GCC, please send in patches.  Just a few comments below...

OK, thanks for clarifying. I don't have a patch yet but I might do one
later. Sato-san's work on adding direct cas.l support showed me how
this part of the gcc code seems to work, so it shouldn't be too hard
to hook it up, but there are ABI design considerations still if we
decide to go this way.

> > Inputs:
> > - R0: Memory address to operate on
> > - R1: Address of implementation function, loaded from a global
> > - R2: Comparison value
> > - R3: Value to set on success
> > 
> > Outputs:
> > - R3: Old value read, ==R2 iff cas succeeded.
> 
> > Preserved: R0, R2.
> > 
> > Clobbered: R1, PR, T.
> 
> The T bit is obviously the result of the cas operation.  So you could
> use it as an output directly instead of the implicit R3 == R2
> condition.

I didn't want to impose a requirement that all backends leave the
result in the T bit. At the C source level, I think most software uses
old==expected as the test for success; this is the API
__sync_val_compare_and_swap provides, and what people used to x86
would naturally do anyway.

> > This call (performed from __asm__ for musl, but gcc would do it as SH
> > "SFUNC") is highly compact/convenient for inlining because it avoids
> > clobbering any of the argument registers that are likely to already
> > be
> > in use by the caller, and it preserves the important values that are
> > likely to be reused after the cas operation.
> > 
> > For J2 and future J4, the function pointer just points to:
> > 
> > 	rts
> > 	 cas.l r2,r3,@r0
> > 
> 
> > and the only costs vs an inline cas.l are loading the address of the
> > function (done in the caller; involves GOT access) and clobbering R1
> > and PR.
> > 
> > This is still a draft design and the version in musl is subject to
> > change at any time since it's not a public API/ABI, but I think it
> > could turn into something useful to have on the gcc side with a
> > -matomic-model=libfunc option or similar. Other ABI considerations
> > for
> > gcc use would be where to store the function pointer and how to
> > initialize it. To be reasonably efficient with FDPIC the caller needs
> > to be responsible for loading the function pointer (and it needs to
> > always point to code, not a function descriptor) so that the callee
> > does not need a GOT pointer passed in.
> 
> Obviously the ABI has been constructed around the J-core's cas.l
> instruction.

Yes, but that was a choice I made after a first draft that was no more
optimal for the other backends and less optimal for J-core. And the
only real choices that were based on the instruction's properties were
using r0 for the address input and swapping the old value into r3
rather than producing it in a different register. Other than these
minor details ABI was guided more by avoiding clobbers/reloads of
potentially valuable data in the caller.

One possible change I just thought of: with one extra instruction in
the J-core version we could have the result come out in r1 and
preserve r3. Similar changes to the other versions are probably easy.

> Do you have plans to add other atomic operations (like
> arithmetic)?

No, at least not in musl. From musl's perspective cas is the main one
that's used anyway. But even in general I don't think there's a
significant advantage to doing 'direct' arithmetic ops without a cas
loop even when you can (with llsc, gusa, or imask model). With gusa
and imask the only time you benefit from not implementing them in
terms of cas is on the _highly_ unlucky/unlikely occasion where an
interrupt occurs between the old-value read before cas and the cas.
For llsc there's more potential advantage because actual smp
contention is possible, but sh4a is probably not a very interesting
target anymore.

> If not, then I'd suggest to name the atomic model
> "libfunc-musl-cas".

I'm not sure how the "musl" naming here makes sense unless you're
thinking of having it just call into musl's definitions, which is
certainly a possible design but not what I had in mind. I was thinking
of adapting the design to gcc and providing something similar via
libgcc.a.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.