Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 15 Feb 2024 09:06:40 -0500
From: Rich Felker <>
To: Stefan O'Rear <>
Cc:, Markus Wichmann <>,
	enh <>
Subject: Re: PAC/BTI Support on aarch64

On Thu, Feb 15, 2024 at 08:29:15AM -0500, Stefan O'Rear wrote:
> On Tue, Feb 13, 2024, at 9:19 PM, Rich Felker wrote:
> > What is the situation on x86? Does it use the same kind of per-page
> > enforcement mode, or is it only global, requiring disabling it if any
> > DSO lacks support? Is the endbr64 opcode a guaranteed-safe nop on
> > older ISA levels, or does it need to be conditional?
> The situation for hardware control flow hardening on risc-v is two
> in-development extensions:
> Zicfilp (landing pads) provides a 4-byte instruction which marks valid
> targets for indirect jumps and calls, written `lpad LABEL`.  This is
> an *architectural NOP at all ISA levels*.  Enforcement is
> process-global, not per-page.
> Indirect jumps can be exempted from landing pad depending on which
> register is used for the address; this is expected to be used if the
> address is obtained from read-only memory or an auipc instruction, so
> jump tables do not use landing pads, nor are landing pads needed after
> direct calls regardless of length.  A function which is not a visible
> symbol and does not have its address taken does not need a landing pad.
> The ABI function return is a member of the set of indirect jumps
> which bypass landing pad checks, so no landing pads are needed at the
> return sites of ABI function calls.  Zicfilp intentionally does not
> provide any protection against ROP, a different extension must be used
> to protect return addresses.

This all sounds very good and reasonable to support.

> Landing pads have a 20-bit label which is expected to be used for a
> function type signature, catching function type confusion events.
> The hashing scheme used to generate the label from the call signature
> has not yet been decided.  The call signature must be placed in the
> x7/t2 register prior to an indirect jump.  The immediate layout is
> such that indirect jump sites can use a single lui instruction with
> a matching 20-bit immediate.  Landing pads do not check x7/t2 if
> reached by a direct jump, so there is no need to initialize it prior
> to a direct jump.  A `lpad 0` matches any incoming type signature.

This is very interesting. I wonder if it will break code with UB like:

It's my belief that it *should* break such code, and that breaking it
would be a feature. But I could see folks making the choice to hash
just the "mechanical" types rather than actual types, and there may be
practical reasons this is what needs to be done.

Note that this also has implications for musl and whether we would
ever be able to redefine some opaque types. In fact, we already have
some types, like pthread_t, which are defined differently in
__cplusplus mode to match a name mangling ABI; these would be badly
broken. I'm not sure what the right fix for that would be. (Doing that
to begin with was almost surely a big mistake.)

> Zicfiss (shadow stacks) provides a new shadow stack pointer register
> and shadow stack memory which cannot be modified using ordinary stores.
> Unlike GCS and SHSTK, the shadow stack is never accessed automatically,
> "sspush ra" and "sspopchk ra" instructions must be added to the prologue
> and epilogue of functions which spill their return address to the stack.
> These instructions are NOPs if the shadow stack is disabled at runtime,
> but are *not architectural NOPs* and will trap if executed on current
> hardware.
> Also unlike GCS and SHSTK, the Zicfiss `ssp` register can be read and
> written from user mode using dedicated instructions, so no special
> mechanism is used for shadow stack switching.
> To my knowledge, nothing analogous to PAC is under development.

This is unfortunate, since PAC seems a lot less invasive and
actually-doable. However, protection equivalent to PAC also seems
possible in software, in an entirely arch-agnostic way, with overhead
only slightly higher than standard SSP... so I'm not sure why we
aren't just pursuing getting compilers to do that rather than chasing
arch-specific anti-ROP hacks vendors are trying to use to
differentiate themselves and remain relevant in the age of open

> Both shadow stacks and landing pads are enabled by bits in the senvcfg
> register, and are exposed via a prctl.  The shadow stack prctl is being
> developed as an architecture-independent API, which provides some form
> of automatic allocation and deallocation of shadow stacks for threads.
> I believe the current strategy for marking CFI support in binaries is
> an ELF note similar to the x86 approach, but have not checked this part
> in detail.

I know this should be written up in more detail, but based on request
on IRC, I think it would be good to go ahead and mention "in public"
on the list:

*** Any API for shadow stacks that involved automatic allocation and
deallocation which can fail "behind the application's back" at runtime
is a very poor candidate for support by musl. ***

To be supported, shadow stacks would probably need to use contiguous
memory (with special protections applied to it for the duration of its
usage as call stack, with automatic end to that status if it's
subsequently accessed with normal loads/stores) with the normal
application-provided stack, so as not to break sigaltstack,
pthread_setstack, makecontext, etc. and not to introduce memory leaks
or conditions under which a behind-the-scenes allocation failure makes
hard program termination the only possible result.

AFAICT the current shadow stack stuff in the kernel (and maybe the
underlying hardware mechanisms) is not usable.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.