Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 21 Dec 2020 20:39:25 -0500
From: Rich Felker <dalias@...c.org>
To: Jesse DeGuire <jesse.a.deguire@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: Musl on Cortex-M Devices

On Mon, Dec 21, 2020 at 06:58:47PM -0500, Jesse DeGuire wrote:
> On Fri, Dec 18, 2020 at 12:30 PM Rich Felker <dalias@...c.org> wrote:
> > If it lacks LDREX and STREX how do you implement atomic? I don't see
> > where you're adding any alternative, so is v6-m support
> > non-functional? That rather defeats the purpose of doing anything to
> > support it...
> 
> Correct, I haven't yet added an alternative. Arm's answer--and what we
> generally do in the embedded world--is to disable interrupts using
> "cpsid", do your thing, then re-enable interrupts with "cpsie". This
> could be done with a new "__a_cas_v6m" variant that I'd add to
> atomics.s. This still won't work for Linux because the "cps(ie|id)"
> instruction is effectively a no-op if it is executed in an
> unprivileged context (meaning you can't trap and emulate it). You'd be
> looking at another system call if you really wanted v6-m Linux. That
> said, this could let Musl work on v6-m in a bare metal or RTOS
> environment, which I think Musl would be great for, and so I'd still
> work on adding support for it. Also, not all v6-m devices support a
> privilege model and run as though everything is privileged.
> ARMv8-M.base is similar to v6-m with LDREX and STREX and so that could
> have full support.

I'm not sure what the right answer for this is and whether it makes
support suitable for upstream or not at this point. We should probably
investigate further. If LDREX/STREX are trappable we could just use
them and require the kernel to trap and emulate but that's kinda ugly
(llsc-type atomics are much harder to emulate correctly than a cas
primitive).

> > > > > diff --git a/src/ldso/arm/tlsdesc.S b/src/ldso/arm/tlsdesc.S
> > > > > index 3ae133c9..1843d97d 100644
> > > > > --- a/src/ldso/arm/tlsdesc.S
> > > > > +++ b/src/ldso/arm/tlsdesc.S
> > > > > @@ -12,13 +12,19 @@ __tlsdesc_static:
> > > > >  .hidden __tlsdesc_dynamic
> > > > >  .type __tlsdesc_dynamic,%function
> > > > >  __tlsdesc_dynamic:
> > > > > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> > > > >       push {r2,r3,ip,lr}
> > > > > +#else
> > > > > +     push {r2-r4,lr}
> > > > > +     mov r2,ip
> > > > > +     push {r2}
> > > > > +#endif
> > > >
> > > > AFAICT ip is not special here. If it's a pain to push/pop, just use a
> > > > different register.
> > >
> > > Gotcha. It looks like there isn't really a good reason for the
> > > original to use IP, either, so I could change that and merge a couple
> > > of those alternate code paths. If you don't mind a few extra
> > > instructions, I think I can get rid of all of the (__thumb2__ ||
> > > !__thumb__) checks in there.
> >
> > This is an extreme hot path so it should probably be optimized with
> > preprocessor where it helps, but that doesn't mean it should have
> > gratuitous PP branches just to pick which register to use when a
> > common one that works everywhere can be chosen.
> 
> Perhaps using ip in the original code is to save pushing another
> register to memory so I won't mess with it since this is a hot path.
> 
> Does this function need to push r2 and r3 and pop them at the end? My
> understanding is that the Arm procedure call standard does not require
> subroutines to preserve r0-r3 unless their contents are needed across
> a function call that may clobber them. The r3 register isn't used here
> until after the call to __a_gettp_ptr. The r2 register is used across
> the call, but code does not make an attempt to preserve it before the
> call or restore it after. __a_getttp_ptr shouldn't mess with it
> anyway. The default version just returns a CP15 register and the
> Cortex-M version will trigger a syscall, which is treated like other
> interrupts and r0-r3+ip are automatically stacked and unstacked on
> Cortex-M interrupts.

Both the TLSDESC resolver function and __aeabi_read_tp (which
__a_gettp_ptr needs to be able to act as a backend for) have special
ABI requirements severely restricting what they can clobber. I don't
recall the exact details right off but it's pretty much "nothing"
except the return value.

> > With M profile support, though, AIUI it's possible that you have the
> > atomics but not the thread pointer. You should not assume that lack of
> > HWCAP_TLS implies lack of the atomics; rather you just have to assume
> > the atomics are present, I think, or use some other means of detection
> > with fallback to interrupt masking (assuming these devices have no
> > kernel/user distinction that prevents you from masking interrupts).
> > HWCAP_TLS should probably be probed just so you don't assume the
> > syscall exists in case a system omits it and does trap-and-emulate for
> > the instruction instead.
> 
> I think I'm starting to understand this, which is good because it's
> looking like my startup code for the micros will need to properly set
> HWCAP before Musl can be used. I assume I'll need to set that
> 'aux{"AT_PLATFORM"}' to "v6" or "v7" as well to make this runtime
> detection work properly. I'll have to figure out if "v6m" and "v7m"
> are supported values for the platform. I may have more questions in
> the future as I try actually implementing something.

Yes that sounds right. There are other aux vector entries that have to
be set correctly too for startup code, particularly AT_PHDR for
__init_tls to find the program headers (and for dl_iterate_phdr to
work). On some archs AT_HWCAP and AT_PLATFORM are also needed for
detection of features. AT_MINSIGSTKSZ is needed if the signal frame
size is variable and may exceed the default one defined in the macro.
AT_RANDOM is desirable for hardening but not mandatory. AT_EXECFN is
used as a fallback for program_invocation_name if auxv[0] is missing.
AT_SYSINFO_EHDR is used to offer vdso but is optional. And AT_*ID and
AT_SECURE are used to control behavior under suid (not trust
environment, etc.).

> Like I mentioned above, I can make a routine to temporarily mask
> interrupts that could be used for v6-m or as a fallback/default for
> other M-Profile devices; however, this only works properly if the
> application is running in a privileged mode like one might do on bare
> metal or an RTOS (not all v6-m devices have an MPU and so not all have
> a concept of privilege anyway). The original code looks like it needs
> to handle the case of one running a Musl built for ARMv6 on something
> newer like ARMv7-A (I assume that's why there's "v7" versions of
> functions in atomics.s). I'll try to follow that as well.

We should try to figure out if there's a reliable way to do this whose
failure mode isn't silently doing the wrong thing.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.