Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 15 Feb 2024 01:03:53 +0100
From: Szabolcs Nagy <>
To: Rich Felker <>
Cc: William Roberts <>, enh <>,
Subject: Re: PAC/BTI Support on aarch64

* Rich Felker <> [2024-02-12 21:08:34 -0500]:
> On Mon, Feb 12, 2024 at 05:18:22PM -0600, William Roberts wrote:
> > On Mon, Feb 12, 2024 at 5:05 PM enh <> wrote:
> > > On Mon, Feb 12, 2024 at 2:46 PM Rich Felker <> wrote:
> > > > On Mon, Feb 12, 2024 at 03:25:48PM -0600, William Roberts wrote:
> > > > > It's a matter of building with -mbranch-protection=standard
> > > > >
> > > > > Just the ASM labels need the first instruction to be a BTI. They're in
> > > > > the NOP space
> > > > > so they are backwards compatible, older hardware will just NOP it.

not quite that simple. sorry long brain dump follows:

tl;dr: i think the main issues are asm handling (property notes
and cfi for pac-ret), property note handling in, perf
overhead and possible compat issues of pac-ret (not possible
to disable per process) and testing (ensuring the code works
when pac/bti is not nop).

- asm code needs manual marking.

GNU_PROPERTY note in asm is ugly and error prone, see

i.e. no equivalent to -Wa,--noexecstack we use for GNU_STACK
(this was an oversight, llvm added an option, binutils gas has none).

- if asm code does indirect tailcall it should use x16 or x17.

bti c is compatible with the indirect branch that way.

- only functions that *maybe called indirectly* need bti c.

this turned out to be trickier than expected: lld/llvm and bfd.ld/gcc
currently disagree about the definition in case of linker inserted
veneers: gcc assumes ld handles the case if an inserted veneer
indirectly branches to a non-bti location, llvm emits bti in all
functions (including local ones without their address taken), just
in case ld inserts an indirect veneer.

there is some difference to how PLTs are emitted between the
linkers, but i don't think that causes compat issues (might cause
trouble for tools that try to interpret the PLT).

- dynamic linker has to figure out when to enable it.

systemd MDWE (memory deny write exec) feature used to seccomp
filter mprotect(PROT_EXEC) so even if the underlying mapping was
already PROT_EXEC and it just added a PROT_BTI on top, mprotect
would fail. this was fixed by adding an MDWE prctl to linux for
systemd to make it stop using that filter, but there may be other
similar seccomp filters and old kernels without MDWE so glibc
re-mmaps the exec segment (which systemd happily accepts).

note: the bit that tells if a load segment needs to be mapped as
PROT_BTI is in the load segment (usually program headers are in
the executable segment) so first mmap cannot get it right, unless
a quick read of the prog headers are done before mmap but that has
a lot of failure modes (the size can be unbounded) and does not
gain much compared to just mmap twice. except when the exe is mapped
by the kernel, then we cannot mmap since there is no fd and mprotect
may fail).
MDWE prctl:

another detail is that static-exe / / vdso marking is handled
by the kernel, while other dsos are PROT_BTI marked by an
interesting case is dynamic linked exe which was originally handled
by, but after the MDWE fiasco the kernel started loading it
with PROT_BTI (i.e. now all binaries mapped by the kernel are BTI
protected by the kernel). should also take care to gracefully handle BTI protection
failure (or invalid notes) in dlopen. (although invalid note is
more of an x86 thing.)

- special functions may need to return indirectly instead of ret

the only really nasty one is swapcontext which is not supported by
musl so that's good (and it is only a problem if bti is used with
a shadow-stack-like feature).

for returns_twice functions (like setjmp) the compiler emits bti j
at the call site so the second return can use an indirect branch.
return from unwinder to an exception handling landing pad via
indirect branch is supported too. otherwise return via indirect
branch is not supported (no bti j at call sites).

- a64fx (hpc core) implemented hint nops in a slow way

so glibc only adds bti if glibc is configured for bti.

> > > > I think it's a little more elaborate than that. Those asm instructions
> > > > need to be added (probably as .instr or .word or something, unless
> > > > there's a way to spell this particular nop that existing tooling will
> > > > understand).
> > 
> > You just use the hint <immediate> instructions, they are understood by old
> > toolchains. But you can only support a subset of the BTI/PAC instructions

yes 'hint <imm>' is armv8.0-a, part of the base isa.

i think the non-hint pac instructions are not relevant to musl.

> > but it's been enough for most projects that follow the normal ABI conventions
> > like OpenSSL/BoringSSL,etc, but not enough for libffi for example.

as far as i know openssl is the reason android does not enable pac:
they added the hint instructions incorrectly at first so there are
binaries that fail if the hw enables pac.

this reveals another issue with pac/bti: since they are nops on most
existing hw, they are not properly tested (e.g. by distro QA) so a
binary can look ok until you move it to a newer machine. (but recent
amazon graviton 4 has pac/bti so we may get more coverage soon.)

> > > > I also wondered if [sig]setjmp/longjmp would be affected, but probably
> > > > not.
> > >
> > > bionic does use PAC, but i think glibc has its own "pointer mangling" thing?
> > 
> > You need it, as the first instruction from a branch (where longjmp returns to)
> > needs to be a BTI instruction.
> Is that different from a normal function return?
> Note that in the case of sigsetjmp, (sig)longjmp returns to a point
> inside the sigsetjmp asm, so that point needs the annotation I think.

setjmp/sigsetjmp has to decide how to protect the longjmp return.

with a shadow-stack-like bw-edge-cfi, longjmp cannot return with
ret (first ret from setjmp consumes the return address from the
shadow stack), it must use indirect branch. (there is now gcs
which is aarch64 shadow stack, linux support is in progress).

since jmpbuf does not expose the return address representation a
libc specific protection can be applied (mangling or pac) and then
longjmp can use ret and remain protected. (but e.g. setcontext
exposes the pc/lr so those cannot be mangled in memory).

compilers emit bti j at the call site of returns_twice functions
to allow both ret and indirect branch. musl sigsetjmp can decide
what it does.

> > > > > It's been done for many projects, glibc and bionic have it. The
> > > > > problem with BTI is that when one item in the link
> > > > > list doesn't support BTI the loader/linker turns it off. So when it's
> > > > > something like a libc that is fundamental in the link chain,
> > > > > it turns it off for everything.
> > > >
> > > > This presumably requires some kind of machinery for how dynamic
> > > > linking will work, and possibly turning it off if a library without it
> > > > is dlopened?
> > > >
> > > > My understanding doing some brief searches though was that you can
> > > > individually mprotect it off in certain regions. So maybe it's
> > > > possible to just enable only for DSOs that support it?
> > >
> > > correct.
> OK, that's good to know. So which direction is it? Do DSOs that
> support BTI need it explicitly turned on via mprotect/mmap flags? Or
> is there some process-global flag to turn it on, and then ones that
> don't support it need it turned off?

per dso marking and explicit PROT_BTI setting.

> I suspect it's possible to first enable BTI for third-party libraries
> as a feature of the dynamic linker, and add BTI support for libc
> itself as a separate thing. That might be a nice factoring to make
> changes minimal and easy for ppl to read.

for third-party, it is enough to fix crt*.o and handle markings

(but of course for bti to be effective the libc must have it too,
otherwise there are likely enough gadgets to do whatever)

> The changes in dynlink.c should be as arch-agnostic as possible. If
> there's a corresponding feature on other archs, it should use the same
> basic code, with arch-specific headers (arch/$ARCH/reloc.h) defining
> the mechanisms for evaluating if an ELF file is compatible, how to do
> the mprotect, etc.

generic code should work.

but e.g. see above about who handles the marking (kernel vs,
turns out x86 (and likely aarch64, riscv) shadow stack uses different
rules: always libc handles the marking. so there are caveats.

> > > > > The initial scope of code changes would be what's reported when
> > > > > LDFLAGS=-Wl,-zforce-bti,--fatal-warnings
> > > >
> > > > Is there a way to disable these warnings so that every asm file does
> > > > not need to be cluttered with annotations?
> > >
> > > well, that's the ELF note stuff i was talking about, and if you don't
> > > have it you'll fall foul of the static linker saying "not all this
> > > code is BTI-enabled, therefore this .so isn't", and the dynamic linker
> > > doing nothing because the static linker effectively tells it not to.
> > 
> > Yep, well said ENH. It's been since Android since we crossed paths :-).
> > 
> > It's not that hard to annotate an asm file :-p I forget what project
> > (I think it was gnutls, but they just use openssl's code for the asm)
> > but I just put it in a header file and by virtue of #include'ing it you get the
> > notes added.
> Yes, we generally don't do that. There are no "asm headers" in musl;
> all asm files are self-contained and readable standalone. So if
> there's no way to tell the assembler/linker from the command line that
> files are BTI-compatible without generating a huge load of warning
> spam, I guess it's a mess of copy-and-paste...

currently there is only the ugly asm directives, see above.

final notes:

bti is fairly deployable (iirc x86 ibt failed because it is not per
dso, so dlopen does not really work, but aarch64 does not have that
problem), not strong security (the final binary is littered with
bti j/c so plenty opportunity to misdirect an indirect jump/call),
but at least it has minimal impact (minimal compatibility issues and
on a modern core even when bti is enabled it should not be slower).

for pac every function can independently decide if it uses pac-ret
(aka return address signing), no need for per dso marking. however
it has bigger compat as well as performance impact:

there are custom unwinders, pac-ret uses a new dwarf cfi (to mark
the code regions where the return address is signed), custom
unwinders may not understand this and that's a runtime crash.

some code looks at or modifies the return address (various hacks), such
code needs to be updated or not use pac-ret in relevant functions, but
such issues are hard to discover without hw.

pac is a per-boot system-wide setting, not per-process, so if there is
any issue or bug there is no way to disable it for one broken binary
(nowadays there is a disable prctl, but it is documented to slow the
system down, so not suitable for working around perf issues).

on simple cores pac can be slow (can add latency to non-leaf functions)

with 48bit va space, there is only 7bit pac, i.e. limited protection.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.