Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 18 May 2015 17:56:18 -0500
From: Bobby Bingham <koorogi@...rogi.info>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] inline llsc atomics when compiling for sh4a

On Sun, May 17, 2015 at 10:34:02PM -0400, Rich Felker wrote:
> On Sun, May 17, 2015 at 01:55:16PM -0500, Bobby Bingham wrote:
> > If we're building for sh4a, the compiler is already free to use
> > instructions only available on sh4a, so we can do the same and inline the
> > llsc atomics. If we're building for an older processor, we still do the
> > same runtime atomics selection as before.
>
> Thanks! I think it's ok for commit as-is, but based on re-reading this
> code I have some ideas for improving it that are orthogonal to this
> change. See comments inline:

Would you prefer I resend this patch to remove the LLSC_* macros at the
same time, or another patch to remove them separately?

>
> > ---
> >  arch/sh/atomic.h     |  83 +++++++++++++++++++++++++++++++
> >  arch/sh/src/atomic.c | 135 +++++++++++++++++----------------------------------
> >  2 files changed, 128 insertions(+), 90 deletions(-)
> >
> > diff --git a/arch/sh/atomic.h b/arch/sh/atomic.h
> > index a1d22e4..f2e6dac 100644
> > --- a/arch/sh/atomic.h
> > +++ b/arch/sh/atomic.h
> > @@ -22,6 +22,88 @@ static inline int a_ctz_64(uint64_t x)
> >  	return a_ctz_l(y);
> >  }
> >
> > +#define LLSC_CLOBBERS "r0", "t", "memory"
> > +#define LLSC_START(mem) "synco\n"  \
> > +	"0:	movli.l @" mem ", r0\n"
> > +#define LLSC_END(mem)              \
> > +	"1:	movco.l r0, @" mem "\n"    \
> > +	"	bf 0b\n"                   \
> > +	"	synco\n"
> > +
> > +static inline int __sh_cas_llsc(volatile int *p, int t, int s)
> > +{
> > +	int old;
> > +	__asm__ __volatile__(
> > +		LLSC_START("%1")
> > +		"	mov r0, %0\n"
> > +		"	cmp/eq %0, %2\n"
> > +		"	bf 1f\n"
> > +		"	mov %3, r0\n"
> > +		LLSC_END("%1")
> > +		: "=&r"(old) : "r"(p), "r"(t), "r"(s) : LLSC_CLOBBERS);
> > +	return old;
> > +}
>
> The mov from r0 to %0 seems unnecessary here. Presumably it's because
> you didn't have a constraint to force old to come out via r0. Could
> you do the following?

No, because the "mov %3, r0" a couple instructions down clobbers r0, and
this is necessary because movco.l only accept r0 as the source operand.

>
> static inline int __sh_cas_llsc(volatile int *p, int t, int s)
> {
> 	register int old __asm__("r0");
> 	__asm__ __volatile__(
> 		LLSC_START("%1")
> 		"	cmp/eq r0, %2\n"
> 		"	bf 1f\n"
> 		"	mov %3, r0\n"
> 		LLSC_END("%1")
> 		: "=&r"(old) : "r"(p), "r"(t), "r"(s) : "t", "memory");
> 	return old;
> }
>
> Note that I had to drop LLSC_CLOBBERS because "r0" was in it and you
> can't clobber an output register.
>
> I've actually always wondered about the value of having the LLSC_*
> macros. I usually prefer for the whole asm to be written out
> explicitly and readable unless there's a compelling reason to wrap it
> up in macros. Then it would look like:

I think it made a bigger difference for the gusa version, so I mostly
did it with llsc for consistency.  And before this patch, with the gusa
and llsc versions were side by side in the same function, it made it
easier for me to verify both versions were doing the same thing as I
wrote them.

Now that the llsc version is moving, I'm less attached to the LLSC_*
macros.  I do think the gusa stuff is ugly and magical enough that I'd
still prefer to keep it hidden away, if you don't object.

>
> static inline int __sh_cas_llsc(volatile int *p, int t, int s)
> {
> 	register int old __asm__("r0");
> 	__asm__ __volatile__(
> 		"	synco\n"
> 		"0:	movli.l @%1, r0\n"
> 		"	cmp/eq r0, %2\n"
> 		"	bf 1f\n"
> 		"	mov %3, r0\n"
> 		"1:	movco.l r0, @%1\n"
> 		"	bf 0b\n"
> 		"	synco\n"
> 		: "=&r"(old) : "r"(p), "r"(t), "r"(s) : "t", "memory");
> 	return old;
> }
>
> and similar for other functions. Part of the motivation of not hiding
> the outer logic in macros is that it might make it possible to
> fold/simplify some special cases like I did above for CAS.

I don't mind in principle, but I think the fact that movco.l requires
its input be in r0 is going to mean there's not actually any
simplification you can do.

>
> Another idea is letting the compiler simplify, with something like the
> following, which could actually be used cross-platform for all
> llsc-type archs:
>
> static inline int __sh_cas_llsc(volatile int *p, int t, int s)
> {
> 	do old = llsc_start(p);
> 	while (*p == t && !llsc_end(p, s));
> 	return old;
> }
>
> Here llsc_start and llsc_end would be inline functions using asm with
> appropriate constraints. Unfortunately I don't see a way to model
> using the value of the truth flag "t" as the output of the asm for
> llsc_end, though. I suspect this would be a problem on a number of
> other archs too; the asm would have to waste an instruction (or
> several) converting the flag to an integer. Unless there's a solution
> to that problem, it makes an approach like this less appealing.

I agree this would be even nicer if we could make it work.

>
> >  int __sh_cas(volatile int *p, int t, int s)
> >  {
> > +	if (__hwcap & CPU_HAS_LLSC) return __sh_cas_llsc(p, t, s);
> > +
> >  	int old;
> > -	if (__hwcap & CPU_HAS_LLSC) {
> > -		__asm__ __volatile__(
> > -			LLSC_START("%1")
> > -			"	mov r0, %0\n"
> > -			"	cmp/eq %0, %2\n"
> > -			"	bf 1f\n"
> > -			"	mov %3, r0\n"
> > -			LLSC_END("%1")
> > -			: "=&r"(old) : "r"(p), "r"(t), "r"(s) : LLSC_CLOBBERS);
> > -	} else {
> > -		__asm__ __volatile__(
> > -			GUSA_START_EVEN("%1", "%0")
> > -			"	cmp/eq %0, %2\n"
> > -			"	bf 1f\n"
> > -			GUSA_END("%1", "%3")
> > -			: "=&r"(old) : "r"(p), "r"(t), "r"(s) : GUSA_CLOBBERS, "t");
> > -	}
> > +	__asm__ __volatile__(
> > +		GUSA_START_EVEN("%1", "%0")
> > +		"	cmp/eq %0, %2\n"
> > +		"	bf 1f\n"
> > +		GUSA_END("%1", "%3")
> > +		: "=&r"(old) : "r"(p), "r"(t), "r"(s) : GUSA_CLOBBERS, "t");
> >  	return old;
> >  }
>
> For the GUSA stuff, do you really nexed the ODD/EVEN macros? I think
> you could just add appropriate .align inside that would cause the
> assembler to insert a nop if necessary.

Should be possible.  I'll work on it and send another patch.

>
> Rich

--
Bobby Bingham

Download attachment "signature.asc" of type "application/pgp-signature" (820 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.