musl - Re: Moving forward with sh2/nommu

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150611151252.GW17573@brightrain.aerifal.cx>
Date: Thu, 11 Jun 2015 11:12:52 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Cc: "D. Jeff Dionne" <Jeff@...inux.org>, ysato@...rs.sourceforge.jp,
	shumpei.kawasaki@...wc.com
Subject: Re: Moving forward with sh2/nommu

On Wed, Jun 10, 2015 at 11:02:35PM -0500, Rob Landley wrote:
> 
> 
> On 06/09/2015 10:30 PM, Rich Felker wrote:
> > On Mon, Jun 01, 2015 at 11:11:07AM -0400, Rich Felker wrote:
> >> [resent to musl list]
> >>
> >> Here's a summary of the issues we need to work through to get a modern
> >> SH2/nommu-targetted musl/toolchain out of the proof-of-concept stage
> >> and to the point where it's something people can use roughly 'out of
> >> the box':
> >>
> >> Kernel issues:
> >>
> >> 1. Kernel should support loading plain ELF directly, unmodified. Right
> > 
> > I have a patch to do this, not polished but it works. It also...
> > 
> >> 2. Kernel insists on having a stack size set in the PT_GNU_STACK
> >>    program header; if it's 0 (the default ld produces) then execve
> >>    fails. It should just provide a default, probably 128k (equal to
> >>    MMU-ful Linux).
> > 
> > ...uses a default stack size of 128k if the header is 0, and...
> 
> That's big enough to give a false sense of security without being big
> enough to actually work reliably on software that isn't thinking about it.

It's the same size MMU-ful Linux gives you, and it works for the vast
majority of applications. Of course with MMU you have the option to
expand on faults, but I think the only application I've ever seen do
this is emacs, and it typically only uses 164k. If you want to survey
existing applications this is easy to do using busybox "top" in "s"
mode, sorted by stack size. I wouldn't object to making the default
slightly larger if measurement shows it's needed.

Keep in mind that the purpose here is running binaries which are _not_
NOMMU-specific and not tuned for NOMMU. If your binaries are
NOMMU-specific than you choose a proper stack size.

> The kernel stack size is apparently 4k, but they've got more than one
> for different contexts:
> 
> https://www.kernel.org/doc/Documentation/x86/x86_64/kernel-stacks

Kernel stack is completely different. Userspace will not run with a 4k
stack if you use libc in any nontrivial way.

> I think I agree to jeff here: the historical default stack size for
> nommu packaging was 8k. That's enough for hello world to work, but in
> this context stack is a resource we need to allocate if we want
> nontrivial amounts of it.

That's doable (though still a bad idea, I think) as a default for a
NOMMU toolchain. It's not okay for running a binary that's not made
for NOMMU. This is not SH--specific but something that affects normal
ELF binaries running on any NOMMU machine, and it's an issue I'm going
to be adamant on, because adding the ability to do this with a huge
security flaw is just irresponsible.

> >> 4. Syscall trap numbers differ on SH2 vs SH3/4. Presumably the reason
> >>    is that these two SH2A hardware traps overlap with the syscall
> >>    range used by SH3/4 ABI:
> > 
> > I haven't patched this yet. I'd like to use 31 (0x1f) as the new
> > universal SH syscall trap number, instead of 22. More details on the
> > reasons later.
> 
> I've cc'd Yoshinori Sato (who did most of the historical sh2 work) and
> Shumpei Kawasaki (the original superh architect). They'll probably have
> an opinion on your "more reasons" for changing sh2 system call numbers
> to match sh4.

Thank you. I'd really like to make progress at least on the matter of
determining if this is feasible. I now have a new musl/sh2 patch that
simply uses "trapa #31" unconditionally, and it's a lot
simpler/cleaner and working on my patched kernel. The big question is
just whether this is an unacceptable constraint on hardware.

> >> musl issues:
> >>
> >> 1. We need runtime detection for the right trap number to use for
> >>    syscalls. Right now I've got the trap numbers hard-coded for SH2 in
> >>    my local tree.
> > 
> > I've written the runtime detection, but I'd rather not have to use it.
> > I managed to avoid inlining a big conditional at each syscall, but
> > there are still multiple ugly issues:
> 
> This is only an issue if you want sh4 to be able to run sh2 binaries.

Yes, and I've explained why this is important. In summary:

- It lets you do initial testing (or even native compiling) on the
  MMU-ful variant of the target where you have things like memory
  protection to assist you in debugging and the ability to use a lot
  of software that wouldn't work on NOMMU.

- It fights the combinatoric explosion of configuration/target
  variants that made uclibc so difficult to maintain by minimizing the
  number of differences and by allowing regression testing without
  specialized hardware (testing can be done on real hardware or on
  qemu-system-sh4eb).

> (Can m68k run coldfire binaries? Last I checked there wasn't any
> blackfin-with-mmu variant.)

As long as you use an ISA level that's a subset of both, there's no
reason for it not to work. Demonstrating it without getting a common
libc working on both would just be a matter of making a -nostdlib PoC
program doing its own syscalls.

> > [...]
> 
> We should finagle together an XIP test setup. There were two different
> people doing it at ELC we could ask questions of.

>From the userspace side (libc/startfiles/toolchain/etc.) XIP should
just work once we can make binaries in a format that allows for
shareable text. XIP-capable hardware isn't really needed to test this
side, just the kernel side.

> >> 2. We need additional runtime detection options for atomics: interrupt
> >>    masking for plain SH2, and the new CAS instruction for SH2J.
> > 
> > This is the one thing I haven't done, so currently the atomic macros
> > are using GUSA which is non-atomic and unsafe on SH2 (if an interrupt
> > happens with invalid stack pointer, memory will be corrupted).
> 
> And doesn't work with SMP at all.

Right. I just added interrupt-masking-based atomics for SH2 on my
side, but I know that's not useful for your SMP setups. It is useful
for older non-SMP SH2 hardware, though.

> > This
> > could be part of the random crashing I've been experiencing (although
> > I reproduced it without musl) so I'll try to add them next.
> 
> I'm going to try to post kernel patches to the list later today, and
> separately email you the horrible ethernet drivers that aren't going
> upstream because horrible. (We need to clean up the ethernet VHDL too,
> it has timing issues that randomly work or don't work because layout
> butterfly effects. Known problem, on the todo list, not my area...)

OK thanks!

> >> 3. We need sh/vfork.s since the default vfork.c just uses fork, which
> >>    won't work. I have a version locally but it doesn't make sense to
> >>    commit without runtime trap number selection.
> > 
> > Done and updated to use runtime selection in the (ugly) patch.
> 
> If they ask for vfork() they should get vfork()...?

Yes. The "runtime selection" is about the syscall trap number, not
whether or not to use vfork. I committed vfork to upstream musl now,
but with a SH3/4 trap number to be consistent with the code that's
upstream now. Later I'll either convert them all to trap 31 (0x1f) if
that ends up being acceptable, or merge the runtime-selection code,
but I think it makes sense to make the change across all files at
once, whichever way it's done.

> >> 4. As long as we're using the FDPIC ELF header flag to get
> >>    binfmt_elf_fdpic.c to load binaries, the startup code needs to call
> >>    the personality() syscall to switch back. I have a local hack for
> >>    doing this in rcrt1.o which is probably not worth upstreaming if we
> >>    can just make the kernel do it right.
> > 
> > No longer needed because of the kernel patch to load normal ELF.
> 
> Send me the patch and I'll add it to my stack to go upstream.

It was attached, but it may need a little bit more cleanup before
going upstream.

> >> 5. The brk workaround I'm doing now can't be upstreamed without a
> >>    reliable runtime way to distinguish nommu. To put it in malloc.c
> >>    this would have to be a cross-arch solution. What might make more
> >>    sense is putting it in syscall_arch.h for sh, where we already
> >>    have to check for SH2 to determine the right trap number; the
> >>    inline syscall code can just do if (nr==SYS_brk&&IS_SH2) return 0;
> > 
> > Commit 276904c2f6bde3a31a24ebfa201482601d18b4f9 in musl solves this in
> > a general manner, even though it's no longer needed with my kernel
> > patch applied.
> 
> No longer needed on sh2 maybe, but there are a half-dozen other nommu
> targets of interest...

The change is not sh-specific. This is in the fdpic elf loader code
which is arch-generic. So if we can get the fix upstreamed, the issue
won't matter on any future kernel versions, but it's still good for
musl to be safe against being run on old kernels.

> > One more musl-side issue I neglected to mention is the __unmapself.s
> > can't work on SH2 because the SH2 trap/interrupt mechanism requires
> > the userspace stack pointer to be valid at all times. This is now
> > solved upstream in commit c30cbcb0a646b1f13a22c645616dce624465b883,
> > but activating it for SH2 requires removing
> > src/thread/sh/__unmapself.s so the generic C file gets used.
> > 
> > The attached patch covers everything described above that's not
> > already upstream, and is sufficient to build musl for sh2 with
> > musl-cross targeting "sheb-linux-musl". I used gcc 4.7.3 because later
> > versions break the kernel. The attached config.mak for musl shows the
> > configure options I used. The attached sheb.specs file is how I got
> > gcc to do always-PIE without breaking the kernel.
> 
> For the newly cc'd the relevant web archive entry is:
> 
> http://www.openwall.com/lists/musl/2015/06/10/1

Thanks!

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.