Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 6 Aug 2018 12:50:19 -0700
From: Kees Cook <keescook@...omium.org>
To: Ard Biesheuvel <ard.biesheuvel@...aro.org>
Cc: Robin Murphy <robin.murphy@....com>, 
	Kernel Hardening <kernel-hardening@...ts.openwall.com>, Mark Rutland <mark.rutland@....com>, 
	Catalin Marinas <catalin.marinas@....com>, Will Deacon <will.deacon@....com>, 
	Christoffer Dall <christoffer.dall@....com>, 
	linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>, 
	Laura Abbott <labbott@...oraproject.org>, Julien Thierry <julien.thierry@....com>
Subject: Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation

On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
<ard.biesheuvel@...aro.org> wrote:
> On 6 August 2018 at 20:49, Kees Cook <keescook@...omium.org> wrote:
>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@....com> wrote:
>>> I guess what I'm getting at is that if the protection mechanism is "always
>>> return with SP outside TTBR1", there seems little point in going through the
>>> motions if SP in TTBR0 could still be valid and allow an attack to succeed
>>> anyway; this is basically just me working through a justification for saying
>>> the proposed scheme needs "depends on ARM64_PAN || ARM64_SW_TTBR0_PAN",
>>> making it that much uglier for v8.0 CPUs...
>>
>> I think anyone with v8.0 CPUs interested in this mitigation would also
>> very much want PAN emulation. If a "depends on" isn't desired, what
>> about "imply" in the Kconfig?
>>
>
> Yes, but actually, using bit #0 is maybe a better alternative in any
> case. You can never dereference SP with bit #0 set, regardless of
> whether the address points to user or kernel space, and my concern
> about reloading sp from x29 doesn't really make sense, given that x29
> is always assigned from sp right after pushing x29 and x30 in the
> function prologue, and sp only gets restored from x29 in the epilogue
> when there is a stack frame to begin with, in which case we add #1 to
> sp again before returning from the function.

Fair enough! :)

> The other code gets a lot cleaner as well.
>
> So for the return we'll have
>
>   ldp     x29, x30, [sp], #nn
>>>add     sp, sp, #0x1
>   ret
>
> and for the function call
>
>   bl      <foo>
>>>mov      x30, sp
>>>bic     sp, x30, #1
>
> The restore sequence in entry.s:96 (which has no spare registers) gets
> much simpler as well:
>
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -95,6 +95,15 @@ alternative_else_nop_endif
>          */
>         add     sp, sp, x0      // sp' = sp + x0
>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
> +#ifdef CONFIG_ARM64_ROP_SHIELD
> +       tbnz    x0, #0, 1f
> +       .subsection     1
> +1:     sub     x0, x0, #1
> +       sub     sp, sp, #1
> +       b       2f
> +       .previous
> +2:
> +#endif
>         tbnz    x0, #THREAD_SHIFT, 0f
>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp = x0
>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 = sp

I get slightly concerned about "add" vs "clear bit", but I don't see a
real way to chain a lot of "add"s to get to avoid the unaligned
access. Is "or" less efficient than "add"?

-Kees

-- 
Kees Cook
Pixel Security

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ