Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 18 Dec 2020 02:15:20 -0500
From: Jesse DeGuire <jesse.a.deguire@...il.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: Musl on Cortex-M Devices

On Wed, Dec 16, 2020 at 7:23 PM Rich Felker <dalias@...c.org> wrote:
>
> On Wed, Dec 16, 2020 at 06:43:15PM -0500, Jesse DeGuire wrote:
> > Hey everyone,
> >
> > I'm working on putting together a Clang-based toolchain to use with
> > Microchip PIC32 (MIPS32) and SAM (Cortex-M) microcontrollers as an
> > alternative to their paid XC32 toolchain and I'd like to use Musl as
> > the C library. Currently, I'm trying to get it to build for a few
> > different Cortex-M devices and have found that Musl builds fine for
> > ARMv7-M, but not for ARMv6-M or v8-M Baseline because Musl uses
> > instructions not supported on the Thumb ISA subset used by v6-M and
> > v8-M Baseline devices. I'm using the v1.2.1 tag version of Musl, but
> > can easily switch to HEAD if needed. I am using a Python script to
> > build Musl (and eventually the rest of the toolchain), which you can
> > see on GitHub at the following link. It's a bit of a mess at the
> > moment, but the build_musl() function is what I'm currently using to
> > build Musl.
>
> I had assumed the thumb1[-ish?] subset wouldn't be interesting, but if
> it is, let's have a look.
>
> > https://github.com/jdeguire/buildPic32Clang/blob/master/buildPic32Clang.py
> >
> > Anyway, I have managed to get Musl to build for v6-M, v7-M, and v8-M
> > Baseline and have attached a diff to this email. If you like, I can go
> > into more detail as to why I made the changes I made; however, many
> > changes were merely the result of my attempts to correct errors
> > reported by Clang due to it encountering instruction sequences not
> > supported on ARMv6-M.
>
> Are there places where clang's linker is failing to make substitutions
> that the GNU one can do, that would make this simpler? For example I
> know the GNU one can replace bx rn by mov pc,rn if needed (based on a
> relocation the assembler emits on the insn).

The main issue I ran into is that ARMv6-M and v8-M.base have many
instructions that can accept only registers R0-R7 as input, so some
extra care or shuffling is required. Many of these were pretty easily
worked around.

> > A number of errors were simply the result of
> > ARMv6-M requiring one to use the "S" variant of an instruction that
> > sets status flags (such as "ADDS" vs "ADD" or "MOVS" vs "MOV"). A few
> > files I had to change from a "lower case s" to a "capital-S" file so
> > that I could use macros to check for either the Thumb1 ISA
> > ("__thumb2__ || !__thumb__") or for an M-Profile device
> > ("!__ARM_ARCH_ISA_ARM").
>
> Is __ARM_ARCH_ISA_ARM universally available (even on old compilers)?
> If not this may need an alternate detection. But I'd like to drop as
> much as possible and just make the code compatible rather than having
> 2 versions of it. I don't think there are any places where the
> performance or size is at all relevant.

I hadn't thought enough about that until your reply, so I used
Compiler Explorer to at least get a general idea of when that macro
was supported. Clang 3.5 supports it, but Clang 3.4.1 does not. GCC
5.4 supports it, but 4.64 (the previous version available on Compiler
Explorer) does not. With that, I tried to come up with something to
detect M-Profile devices and have updated my changes with it.

   #if __ARM_ARCH_6M__ || __ARM_ARCH_7M__ || __ARM_ARCH_7EM__ \
     || (__ARM_ARCH > 7 && !__ARM_ARCH_ISA_ARM)

This presumes that the __ARM_ARCH_xx__ macros were supported for as
long as that variant was supported and that any compiler that supports
an ARM arch greater than 7 is new enough to have the
__ARM_ARCH_ISA_ARM macro, but this should cover a wider range of
toolchains.

I also did merge some of the alternate paths like you suggested.

> > The changes under
> > "src/thread/arm/__set_thread_area.c" are different in that I made
> > those because I don't believe Cortex-M devices could handle what was
> > there (no M-Profile device has Coprocessor 15, for example) and so I
>
> Unless this is an ISA level that can't be executed on a normal (non-M)
> ARM profile, it still needs all the backends that might be needed and
> runtime selection of which to use. This is okay. I believe Linux for
> nommu ARM has a syscall for get_tp, which is rather awful but probably
> needs to be added as a backend. The right way to do this would have
> been with trap-and-emulate (of cp15) I think...

Out of curiosity, what makes that syscall awful?

> > sort of punted for now and figured that a user's fault handler could
> > handle those "_kuser" calls given that there probably won't be a
> > kernel to do so.
>
> I don't think you can use the kuser_helper versions here because (1)
> they're at ARM addresses, not thumb ones, and (2) they're in the
> address range that Cortex M uses for special stuff.

Ah, right you are! It turns out that the range 0xE0100000 and above is
a "vendor system region", presumably for vendors to add their own
system control or debug features on top of what ARM already provides
and so one cannot assume that the region is empty.

> > Unfortunately, I'm not far enough yet in my project to build code that
> > would actually run on one of the micros I have with me, so I haven't
> > yet been able to properly try out these changes. Also, I should
> > mention that my eventual plan is to make a "baremetal" layer that
> > would provide stubs to handle syscalls or thread-related things that
> > you wouldn't normally have on a microcontroller, but I want to get
> > Musl building without that first because I think I'll be able to
> > utilize a lot of what's already present when I get to that point.
> > Hopefully, what's here is still somewhat useful as a starting point
> > should Musl want to support Cortex-M. I'll try to update this diff as
> > a result of feedback or my own eventual testing (I'm a long way from
> > that, though!).
>
> Thanks for the update. Please try to attach as UTF-8 rather than
> UTF-16 next time so it's more readable/accessible. I had to guess it
> was UTF-16 and switch the mime encoding headers to read it.

Oops, sorry about that. I must have accidentally created the diff with
PowerShell instead of my normal WSL Bash terminal.

> Further comments inline below:
>
> > diff --git a/arch/arm/atomic_arch.h b/arch/arm/atomic_arch.h
> > index 9e3937cc..21f102ad 100644
> > --- a/arch/arm/atomic_arch.h
> > +++ b/arch/arm/atomic_arch.h
> > @@ -64,11 +64,18 @@ static inline int a_cas(volatile int *p, int t, int s)
> >
> >  #ifndef a_barrier
> >  #define a_barrier a_barrier
> > +#if __ARM_ARCH_ISA_ARM
> >  static inline void a_barrier()
> >  {
> >       register uintptr_t ip __asm__("ip") = __a_barrier_ptr;
> >       __asm__ __volatile__( BLX " ip" : "+r"(ip) : : "memory", "cc", "lr" );
> >  }
> > +#else
> > +static inline void a_barrier()
> > +{
> > +     __asm__ __volatile__ ("dmb" : : : "memory");
> > +}
> > +#endif
> >  #endif
>
> There's a condition above with this code already; you just need to
> make it used for an additional case.

Fixed. The original code was nested inside an ifdef that also would
have used LDREX and STREX--neither of which ARMv6-M supports--so I
moved that out and added the extra check.

> >  #define a_crash a_crash
> > diff --git a/arch/arm/crt_arch.h b/arch/arm/crt_arch.h
> > index 99508b1d..c48b14b0 100644
> > --- a/arch/arm/crt_arch.h
> > +++ b/arch/arm/crt_arch.h
> > @@ -3,13 +3,25 @@ __asm__(
> >  ".global " START " \n"
> >  ".type " START ",%function \n"
> >  START ": \n"
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >  "    mov fp, #0 \n"
> >  "    mov lr, #0 \n"
> > +#else
> > +"    movs a3, #0 \n"
> > +"    mov fp, a3 \n"
> > +"    mov lr, a3 \n"
> > +#endif
> >  "    ldr a2, 1f \n"
> >  "    add a2, pc, a2 \n"
> >  "    mov a1, sp \n"
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >  "2:  and ip, a1, #-16 \n"
> >  "    mov sp, ip \n"
> > +#else
> > +"2:  subs a3, #16 \n"
> > +"    ands a1, a3 \n"
> > +"    mov sp, a1 \n"
> > +#endif
> >  "    bl " START "_c \n"
> >  ".weak _DYNAMIC \n"
> >  ".hidden _DYNAMIC \n"
>
> This can just use the new code unconditionally.

Done.

> > diff --git a/arch/arm/pthread_arch.h b/arch/arm/pthread_arch.h
> > index e689ea21..a89fb554 100644
> > --- a/arch/arm/pthread_arch.h
> > +++ b/arch/arm/pthread_arch.h
> > @@ -1,5 +1,5 @@
> > -#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > - || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
> > +#if __ARM_ARCH_ISA_ARM && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > + || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7)
>
> That's probably not the right condition.

Can you provide more detail as I'm missing what you mean? The intent
was to prevent this check from succeeding for any M-Profile device so
that they fall back to the __a gettp_ptr solution. I did update this
check to what I described at the top of this reply.

> >  static inline pthread_t __pthread_self()
> >  {
> > diff --git a/configure b/configure
> > old mode 100755
> > new mode 100644
> > diff --git a/crt/arm/crtn.S b/crt/arm/crtn.S
> > index dc020f92..83e20da2 100644
> > --- a/crt/arm/crtn.S
> > +++ b/crt/arm/crtn.S
> > @@ -1,9 +1,17 @@
> >  .syntax unified
> >
> >  .section .init
> > +#if __ARM_ARCH_ISA_ARM
> >       pop {r0,lr}
> >       bx lr
> > +#else
> > +     pop {r0,pc}
> > +#endif
> >
> >  .section .fini
> > +#if __ARM_ARCH_ISA_ARM
> >       pop {r0,lr}
> >       bx lr
> > +#else
> > +     pop {r0,pc}
> > +#endif
>
> Ideally the linker could handle this.

The issue here is that ARMv6-M (and I believe v8-M.base) cannot POP
into LR. Can this sequence replace LR with R1? My current version POPs
into R1 and moves that into LR because I wasn't sure if that was
important for the caller.

> > diff --git a/src/ldso/arm/tlsdesc.S b/src/ldso/arm/tlsdesc.S
> > index 3ae133c9..1843d97d 100644
> > --- a/src/ldso/arm/tlsdesc.S
> > +++ b/src/ldso/arm/tlsdesc.S
> > @@ -12,13 +12,19 @@ __tlsdesc_static:
> >  .hidden __tlsdesc_dynamic
> >  .type __tlsdesc_dynamic,%function
> >  __tlsdesc_dynamic:
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       push {r2,r3,ip,lr}
> > +#else
> > +     push {r2-r4,lr}
> > +     mov r2,ip
> > +     push {r2}
> > +#endif
>
> AFAICT ip is not special here. If it's a pain to push/pop, just use a
> different register.

Gotcha. It looks like there isn't really a good reason for the
original to use IP, either, so I could change that and merge a couple
of those alternate code paths. If you don't mind a few extra
instructions, I think I can get rid of all of the (__thumb2__ ||
!__thumb__) checks in there.

> >       ldr r1,[r0]
> >       ldr r2,[r1,#4]  // r2 = offset
> >       ldr r1,[r1]     // r1 = modid
> >
> > -#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > - || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
> > +#if __ARM_ARCH_ISA_ARM && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > + || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7)
> >       mrc p15,0,r0,c13,c0,3
> >  #else
> >       ldr r0,1f
> > @@ -36,19 +42,34 @@ __tlsdesc_dynamic:
> >       bx r0
> >  #endif
> >  #endif
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       ldr r3,[r0,#-4] // r3 = dtv
> >       ldr ip,[r3,r1,LSL #2]
> >       sub r0,ip,r0
> > +#else
> > +     mov r4,r0
> > +     subs r4,#4
> > +     ldr r3,[r4]
> > +     lsls r4,r1,#2
> > +     ldr r4,[r3,r4]
> > +     subs r0,r4,r0
> > +#endif
> >       add r0,r0,r2    // r0 = r3[r1]-r0+r2
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >  #if __ARM_ARCH >= 5
> >       pop {r2,r3,ip,pc}
> >  #else
> >       pop {r2,r3,ip,lr}
> >       bx lr
> >  #endif
> > +#else
> > +     pop {r2}
> > +     mov ip,r2
> > +     pop {r2-r4,pc}
> > +#endif
> >
> > -#if ((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > - || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7
> > +#if __ARM_ARCH_ISA_ARM && (((__ARM_ARCH_6K__ || __ARM_ARCH_6KZ__ || __ARM_ARCH_6ZK__) && !__thumb__) \
> > + || __ARM_ARCH_7A__ || __ARM_ARCH_7R__ || __ARM_ARCH >= 7)
> >  #else
> >       .align 2
> >  1:   .word __a_gettp_ptr - 2b
> > diff --git a/src/process/arm/vfork.s b/src/process/arm/vfork.s
> > index d7ec41b3..b6f0260e 100644
> > --- a/src/process/arm/vfork.s
> > +++ b/src/process/arm/vfork.s
> > @@ -3,7 +3,7 @@
> >  .type vfork,%function
> >  vfork:
> >       mov ip, r7
> > -     mov r7, 190
> > +     movs r7, 190
> >       svc 0
> >       mov r7, ip
> >       .hidden __syscall_ret
>
> OK.
>
> > diff --git a/src/setjmp/arm/longjmp.S b/src/setjmp/arm/longjmp.S
> > index 8df0b819..745c9ba2 100644
> > --- a/src/setjmp/arm/longjmp.S
> > +++ b/src/setjmp/arm/longjmp.S
> > @@ -5,6 +5,7 @@
> >  .type longjmp,%function
> >  _longjmp:
> >  longjmp:
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       mov ip,r0
> >       movs r0,r1
> >       moveq r0,#1
> > @@ -16,7 +17,7 @@ longjmp:
> >       ldr r2,1f
> >       ldr r1,[r1,r2]
> >
> > -#if __ARM_ARCH < 8
> > +#if __ARM_ARCH_ISA_ARM  &&  __ARM_ARCH < 8
> >       tst r1,#0x260
> >       beq 3f
> >       // HWCAP_ARM_FPA
> > @@ -31,7 +32,7 @@ longjmp:
> >       .fpu softvfp
> >       .eabi_attribute 10, 0
> >       .eabi_attribute 27, 0
> > -#if __ARM_ARCH < 8
> > +#if __ARM_ARCH_ISA_ARM  &&  __ARM_ARCH < 8
> >       // HWCAP_ARM_IWMMXT
> >  2:   tst r1,#0x200
> >       beq 3f
> > @@ -42,6 +43,36 @@ longjmp:
> >       ldcl p1, cr14, [ip], #8
> >       ldcl p1, cr15, [ip], #8
> >  #endif
> > +#else
> > +     mov ip,r0
> > +     movs r0,r1
> > +     bne 4f
> > +     movs r0,#1
> > +4:   mov r1,ip
> > +     adds r1,#16
> > +     ldmia r1!, {r2-r7}
> > +     mov lr,r7
> > +     mov sp,r6
> > +     mov r11,r5
> > +     mov r10,r4
> > +     mov r9,r3
> > +     mov r8,r2
> > +     mov ip,r1
> > +     subs r1,#40
> > +     ldmia r1!, {r4-r7}
> > +
> > +     adr r1,1f
> > +     ldr r2,1f
> > +     ldr r1,[r1,r2]
> > +     movs r2,#0x40
> > +     tst r1,r2
> > +     beq 2f
> > +     .fpu vfp
> > +     vldmia ip!, {d8-d15}
> > +     .fpu softvfp
> > +     .eabi_attribute 10, 0
> > +     .eabi_attribute 27, 0
> > +#endif
> >  2:
> >  3:   bx lr
>
> This should probably be unified somehow.

I was able to partially merge this.

> > diff --git a/src/setjmp/arm/setjmp.S b/src/setjmp/arm/setjmp.S
> > index 45731d22..440c166d 100644
> > --- a/src/setjmp/arm/setjmp.S
> > +++ b/src/setjmp/arm/setjmp.S
> > @@ -8,6 +8,7 @@
> >  __setjmp:
> >  _setjmp:
> >  setjmp:
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       mov ip,r0
> >       stmia ip!,{v1,v2,v3,v4,v5,v6,sl,fp}
> >       mov r2,sp
> > @@ -18,7 +19,7 @@ setjmp:
> >       ldr r2,1f
> >       ldr r1,[r1,r2]
> >
> > -#if __ARM_ARCH < 8
> > +#if __ARM_ARCH_ISA_ARM  &&  __ARM_ARCH < 8
> >       tst r1,#0x260
> >       beq 3f
> >       // HWCAP_ARM_FPA
> > @@ -33,7 +34,7 @@ setjmp:
> >       .fpu softvfp
> >       .eabi_attribute 10, 0
> >       .eabi_attribute 27, 0
> > -#if __ARM_ARCH < 8
> > +#if __ARM_ARCH_ISA_ARM  &&  __ARM_ARCH < 8
> >       // HWCAP_ARM_IWMMXT
> >  2:   tst r1,#0x200
> >       beq 3f
> > @@ -44,6 +45,30 @@ setjmp:
> >       stcl p1, cr14, [ip], #8
> >       stcl p1, cr15, [ip], #8
> >  #endif
> > +#else
> > +     stmia r0!,{r4-r7}
> > +     mov r1,r8
> > +     mov r2,r9
> > +     mov r3,r10
> > +     mov r4,r11
> > +     mov r5,sp
> > +     mov r6,lr
> > +     stmia r0!,{r1-r6}
> > +     mov ip,r0
> > +     movs r0,#0
> > +
> > +     adr r1,1f
> > +     ldr r2,1f
> > +     ldr r1,[r1,r2]
> > +     movs r2,#0x40
> > +     tst r1,r2
> > +     beq 2f
> > +     .fpu vfp
> > +     vstmia ip!, {d8-d15}
> > +     .fpu softvfp
> > +     .eabi_attribute 10, 0
> > +     .eabi_attribute 27, 0
> > +#endif
> >  2:
> >  3:   bx lr
>
> Same here.

I partially merged this as well.

> > diff --git a/src/signal/arm/restore.s b/src/signal/arm/restore.s
> > index fb086d9b..2b7621b1 100644
> > --- a/src/signal/arm/restore.s
> > +++ b/src/signal/arm/restore.s
> > @@ -4,12 +4,12 @@
> >  .hidden __restore
> >  .type __restore,%function
> >  __restore:
> > -     mov r7,#119
> > +     movs r7,#119
> >       swi 0x0
> >
> >  .global __restore_rt
> >  .hidden __restore_rt
> >  .type __restore_rt,%function
> >  __restore_rt:
> > -     mov r7,#173
> > +     movs r7,#173
> >       swi 0x0
>
> Probably OK, unles gdb does something ridiculous wanting the exact
> insn sequence...
>
> > diff --git a/src/signal/arm/sigsetjmp.S b/src/signal/arm/sigsetjmp.S
> > index 69ebbf49..85a71666 100644
> > --- a/src/signal/arm/sigsetjmp.S
> > +++ b/src/signal/arm/sigsetjmp.S
> > @@ -9,16 +9,36 @@ __sigsetjmp:
> >       bne 1f
> >       b setjmp
> >
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >  1:   str lr,[r0,#256]
> >       str r4,[r0,#260+8]
> > +#else
> > +1:   mov ip,r2
> > +     mov r2,lr
> > +     adds r0,#200
> > +     str r2,[r0,#56]
> > +     str r4,[r0,#60+8]
> > +     subs r0,#200
> > +     mov r2,ip
> > +#endif
> >       mov r4,r0
> >
> >       bl setjmp
> >
> >       mov r1,r0
> >       mov r0,r4
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       ldr lr,[r0,#256]
> >       ldr r4,[r0,#260+8]
> > +#else
> > +     mov ip,r2
> > +     adds r0,#200
> > +     ldr r2,[r0,#56]
> > +     mov lr,r2
> > +     ldr r4,[r0,#60+8]
> > +     subs r0,#200
> > +     mov r2,ip
> > +#endif
>
> Ideally should be unified.

Done.

> >  .hidden __sigsetjmp_tail
> >       b __sigsetjmp_tail
> > diff --git a/src/string/arm/memcpy.S b/src/string/arm/memcpy.S
> > index 869e3448..c5d17533 100644
> > --- a/src/string/arm/memcpy.S
> > +++ b/src/string/arm/memcpy.S
> > @@ -43,6 +43,8 @@
> >   * building as thumb 2 and big-endian.
> >   */
> >
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> > +
> >  .syntax unified
> >
> >  .global memcpy
> > @@ -477,3 +479,4 @@ copy_last_3_and_return:
> >       ldmfd   sp!, {r0, r4, lr}
> >       bx      lr
> >
> > +#endif // defined(__thumb2__)  ||  !defined(__thumb__)
>
> The logic to enable the C memcpy seems to be missing.

I wasn't actually sure of the best way to do this. In my current
version, I sort of followed the lead of the stuff in src/math/arm and
made it a C file that will either conditionally have the ARM version
as inline assembly or include the generic C version. Is there a better
way? The diff didn't capture this because of the rename; should I
attach it separately?

> > diff --git a/src/thread/arm/__set_thread_area.c b/src/thread/arm/__set_thread_area.c
> > index 09de65aa..cb5d757d 100644
> > --- a/src/thread/arm/__set_thread_area.c
> > +++ b/src/thread/arm/__set_thread_area.c
> > @@ -26,7 +26,11 @@ extern hidden uintptr_t __a_barrier_ptr, __a_cas_ptr, __a_gettp_ptr;
> >
> >  int __set_thread_area(void *p)
> >  {
> > -#if !__ARM_ARCH_7A__ && !__ARM_ARCH_7R__ && __ARM_ARCH < 7
> > +#if !__ARM_ARCH_ISA_ARM
> > +     __a_gettp_ptr = __a_gettp_kuser;
> > +     __a_cas_ptr = __a_cas_kuser;
> > +     __a_barrier_ptr = __a_barrier_kuser;
> > +#elif !__ARM_ARCH_7A__ && !__ARM_ARCH_7R__ && __ARM_ARCH < 7
> >       if (__hwcap & HWCAP_TLS) {
> >               size_t *aux;
> >               __a_cas_ptr = __a_cas_v7;
>
> HWCAP needs to be honored here. You also need to add the new logic to
> use the syscall for getting thread pointer, I think.

I updated this to use the "get_tls" syscall (0xf0006) for M-Profile
devices. The barrier and cas pointers point to the default dummy
functions.

I'm not entirely sure how I should handle HWCAP. What should I do if
TLS is or isn't set? If it isn't set, the original code uses those
'kuser' addresses that you pointed out aren't usable and will bail out
if the version at the special address isn't correct. Right now, I have
mine set to call a_crash() if TLS is not set in HWCAP because that
seems to be the closest I can come to the current functionality.

> > diff --git a/src/thread/arm/__unmapself.s b/src/thread/arm/__unmapself.s
> > index 29c2d07b..fe0406e3 100644
> > --- a/src/thread/arm/__unmapself.s
> > +++ b/src/thread/arm/__unmapself.s
> > @@ -3,7 +3,7 @@
> >  .global __unmapself
> >  .type   __unmapself,%function
> >  __unmapself:
> > -     mov r7,#91
> > +     movs r7,#91
> >       svc 0
> > -     mov r7,#1
> > +     movs r7,#1
> >       svc 0
>
> OK, but this file can't be used on nommu anyway. Instead the generic C
> version has to be used.

LIke memcpy above (and the stuffi in src/math/arm), I made this a C
file that will either have this code as inline assembly or will
include the generic C version. The diff didn't capture this, should I
attach this separately?

> > diff --git a/src/thread/arm/atomics.S b/src/thread/arm/atomics.S
> > index da50508d..d49809d9 100644
> > --- a/src/thread/arm/atomics.S
> > +++ b/src/thread/arm/atomics.S
> > @@ -1,3 +1,5 @@
> > +#if __ARM_ARCH_ISA_ARM
> > +
> >  .syntax unified
> >  .text
> >
> > @@ -104,3 +106,5 @@ __a_cas_ptr:
> >  .hidden __a_gettp_ptr
> >  __a_gettp_ptr:
> >       .word __a_gettp_cp15
> > +
> > +#endif
>
> This file should not be conditional. If it can't be assembled in
> default mode it just needs ISA level override directives. Which
> functions are actually called is selected at runtime.

I removed the condition so that everything in here is always built and
added directives as needed.

I did run into an issue, though. Is there a way to restore the default
architecture after setting it using ".arch <arch>"? Something like
".set push" and ".set pop" in MIPS assembly? I didn't realize what the
"scope" of those .arch directives were until it occurred to me that
one of the functions shouldn't have built for ARMv6-M. I rearranged
the functions so that the ones without .arch directives come first,
which did take care of the problem for now. Trying ".arch" by itself
gets me an error from Clang.

> > diff --git a/src/thread/arm/clone.S b/src/thread/arm/clone.S
> > index bb0965da..70d29f70 100644
> > --- a/src/thread/arm/clone.S
> > +++ b/src/thread/arm/clone.S
> > @@ -4,24 +4,30 @@
> >  .hidden __clone
> >  .type   __clone,%function
> >  __clone:
> > -     stmfd sp!,{r4,r5,r6,r7}
> > -     mov r7,#120
> > +     push {r4,r5,r6,r7}
> > +     movs r7,#120
> >       mov r6,r3
> >       mov r5,r0
> >       mov r0,r2
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       and r1,r1,#-16
> > +#else
> > +     movs r2,#0
> > +     subs r2,#16
> > +     ands r1,r2
> > +#endif
> >       ldr r2,[sp,#16]
> >       ldr r3,[sp,#20]
> >       ldr r4,[sp,#24]
> >       svc 0
> >       tst r0,r0
> >       beq 1f
> > -     ldmfd sp!,{r4,r5,r6,r7}
> > +     pop {r4,r5,r6,r7}
> >       bx lr
> >
> >  1:   mov r0,r6
> >       bl 3f
> > -2:   mov r7,#1
> > +2:   movs r7,#1
> >       svc 0
> >       b 2b
>
> I think this can all be unconditional, no #if's.

Changed to remove the #ifs.

> > diff --git a/src/thread/arm/syscall_cp.S b/src/thread/arm/syscall_cp.S
> > index e607dd42..f53cbb3a 100644
> > --- a/src/thread/arm/syscall_cp.S
> > +++ b/src/thread/arm/syscall_cp.S
> > @@ -11,7 +11,7 @@
> >  .type __syscall_cp_asm,%function
> >  __syscall_cp_asm:
> >       mov ip,sp
> > -     stmfd sp!,{r4,r5,r6,r7}
> > +     push {r4,r5,r6,r7}
> >  __cp_begin:
> >       ldr r0,[r0]
> >       cmp r0,#0
> > @@ -19,11 +19,16 @@ __cp_begin:
> >       mov r7,r1
> >       mov r0,r2
> >       mov r1,r3
> > +#if defined(__thumb2__)  ||  !defined(__thumb__)
> >       ldmfd ip,{r2,r3,r4,r5,r6}
> > +#else
> > +     mov r2,ip
> > +     ldmfd r2,{r2,r3,r4,r5,r6}
> > +#endif
> >       svc 0
> >  __cp_end:
> > -     ldmfd sp!,{r4,r5,r6,r7}
> > +     pop {r4,r5,r6,r7}
> >       bx lr
> >  __cp_cancel:
> > -     ldmfd sp!,{r4,r5,r6,r7}
> > +     pop {r4,r5,r6,r7}
> >       b __cancel
> > diff --git a/tools/install.sh b/tools/install.sh
> > old mode 100755
> > new mode 100644
>
> This can be unconditional, and maybe a register other than ip can be
> used to make it simpler..?
>
> Rich

Removed the condition.

Thanks for your help!
Jesse

Download attachment "musl_cortexm_v2.diff" of type "application/octet-stream" (11640 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.