musl - Re: SH sigcontext ABI is broken

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 24 Jun 2015 19:54:08 -0400
From: Rich Felker <dalias@...c.org>
To: Rob Landley <rob@...dley.net>
Cc: musl@...ts.openwall.com
Subject: Re: SH sigcontext ABI is broken

On Wed, Jun 24, 2015 at 05:32:59PM -0500, Rob Landley wrote:
> 
> 
> On 06/24/2015 04:34 PM, Rich Felker wrote:
> > On Wed, Jun 24, 2015 at 03:40:50AM -0500, Rob Landley wrote:
> >>> Since our new SH2 binaries (using ELF, musl, and possibly glibc if the
> >>> port is not dropped) are also going to be compatible with running on
> >>> later MMU-ful hardware (e.g. J4), I don't want this same issue to be a
> >>> point of breakage for them.
> >>
> >> This is the "lost the plot" part. I don't get it. What's the point? I do
> >> not understand why you have this as a goal.
> > 
> > I'll try to keep this brief since it's not the point of this thread:
> > 
> > - Ability to test your binaries with qemu-[system-]sh4[eb].
> 
> I have a todo item to teach that about sh2.

For system-level, that makes sense if you want to debug nommu kernels.
But if you want to test or debug userspace stuff, you're much better
off with the existing qemu-[system-]sh4[eb]. It will run your sh2
binaries fine, and you get all the added benefits of mmu.

> > - Ability to test and debug on a real machine with MMU where crashes
> >   are debuggable and don't bring down the whole system.
> 
> If what you want is a debugging tool, a qemu-sh2 application emulation
> could in theory debug those crashes too.

I don't see how qemu-sh2 would differ from qemu-sh4 at all (for app
level emulation) aside from restricting the ISA subset you can use. To
userspace they look the same.

> > - Avoiding death-by-target-combinatorics (ala uclibc).
> 
> sh2 will not run an sh4 binary. Restricting sh4 to only running sh2
> binaries would cripple the platform. The remaining sliver of use case
> seems to be debugging.

No, sh2 vs sh4-nofpu is the same as -march=i486 vs -march=i686. The
only difference is ISA level. Nobody is restricting you from running
i686 binaries, but you can use the baseline ISA when you want
compatibility with both, and (most importantly, for avoiding death by
combinatorics) the differences are entirely at the comiler level, so
there is no additional cost to supporting both beyond what a compiler
already provides.

In the case of sh4 with the fpu abi, it's a different abi, but this
still massively reduces the number of combinations. Instead of:

	sh2, sh2a, sh3, sh4-nofpu, sh4/4a

(and LE/BE variants of each) you only have LE/BE versions of two
underlying ABIs:

	sh-nofpu, sh

This is on par with _most_ risc archs.

> > - Upgrade path from SH2 to SH4.
> 
> It's open source. You can recompile.
> 
> > - Sharing base userspace between low-end SH2 devices and higher-end
> >   SH4-based model.
> 
> It's open source. You can recompile.

Just because the OS is doesn't mean that a device/product based on it
is. Even if so, there are plenty of hidden costs, such as users
bricking devices when they upgrade with the wrong firmware.

> > If you want to discuss this in more detail let's do it somewhere other
> > than in a thread CC'd on several lists it's not terribly relevant to.
> 
> Trimmed the glibc and kernel lists from this reply.

Thanks.

> >>> The userspace SH2 ABI is nofpu (no float registers for float args), so
> >>> there is already a separate userspace ABI for SH2 (and SH3) vs the
> >>> usual SH4 ABI with float. That's not a problem.
> >>
> >> Yes, a separate ABI for sh2 vs sh4 has not, historically speaking, been
> >> a problem.
> > 
> > Separate userspace ABI is not a problem. The problem is the kernel
> > ABI.
> 
> The binary loader is kernel abi.

Yes and no; it's completely possible to do loader in userspace with
binfmt_misc. But the legacy brokenness of nommu kernel in terms of not
supporting plain ELF is just a bug that will soon be in the past.
There's no actual reason for it.

> The system call interface is kernel abi.

Not in the same way ucontext_t is. If you really want to stick with
the backwards way things were done on sh2 in the past, the system call
interface can be defined as "you use trap 47 if this bit in
AT_PLATFORM is set, and trap 31 if it's not" and then all SH variants
have the exact same "syscall ABI" with no changes to the kernel. But
since there's so much broken stuff we need to fix already, we might as
well unify this on the kernel side and get rid of that ugliness.

There is no way you can say "the definition of this C type varies at
runtime" within a given ABI. Offsets are hard-coded by the compiler at
compile time.

> >>> If we switch to using the same ucontext_t layout everywhere, the
> >>> kernel does not have to be smart, and the kernel ABI looks the same
> >>> for all SH variants, but old binaries (if they depend on ucontext_t
> >>> layout, which is _rare_ anyway) could break.
> >>
> [...]
> 
> There was never any glibc sh2 support. There's no installed base there,
> it was all uclibc, and uclibc is dead, and never had binary
> compatability even between different configs of the same version.
> 
> I also note that Linux support for sh2 went in via commit 9d4436a6fbc8
> in November 2006 which is 9 years ago (2.6.19), so the oldest binaries
> we'd worry about are less than a decade old anyway.

This was more about breaking sh3.

> > But unless someone steps forward and says SH3 ucontext_t ABI is
> > important to existing applications that are deploying new kernels, I
> > think we can just wait to address this issue (with a personality)
> > if/when it ever arises.
> > 
> >>> My leaning at this point, especially since you say SH3 is irrelevant,
> >>> is to use the same ucontext_t layout for them all (with the float reg
> >>> space empty for nofpu chips). If any real-world old apps break and
> >>> people care about them, we could make a personality that you set
> >>> manually for old-nofpu ucontext_t layout. But I suspect the issue will
> >>> just go away.
> >>
> >> I suspect the issue will just go away too.
> >>
> >> After more patents expire next year, we can add full sh4 compatibility
> >> to j-core. If we want a better userspace api ala x86's x32 or mips
> >> o32/n32/nubi or arm's oabi/eabi, we can do that. (In fact that's one of
> >> 0pf.org's goals, kawasaki-san is _trying_ to run a standards body. If
> >> you want to wave an abi proposal at him for comment, he is the original
> >> superh architect...)
> > 
> > The SH ABI seems pretty good as-is, especially considering the
> > constraints it's working with. The only additional need for ABI work I
> > see at the moment is getting FDPIC working.
> 
> I thought it was too. I'm confused by the sh2/sh4 unification effort...

There's basically nothing to be done for it. Only one problem has been
found and it also affects sh3 and sh4-nofpu; it's not specific to sh2
unification.

You could call the trap number an issue too, but that's just clean-up
to avoid ugly hacks to work with the old system.

> >> I want musl to support sh2 but I _also_ want it to support coldfire and
> >> h8300 and so on. If musl is the successor to uclibc (which needs to be
> >> put out of its misery), it needs nommu support for several different
> >> architectures. If you insist that every nommu architecture must also run
> >> those nommu binaries on with-mmu sibling architectures, you're going to
> >> be unifying coldfire and m68k next...
> > 
> > If you look at the kernel I'm pretty sure that already works...
> > Coldfire does not seem to be a separate arch/ABI as far as the kernel
> > is concerned.
> 
> I'll take your word for it.

OK. You can see it for real once someone does an m68k arch for musl.
:-)

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.