kernel-hardening - Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKv+Gu-5ERHvqgXHR626QzKCZJPc0Cp5Ktv+gwucep5ZGvu9Vw@mail.gmail.com>
Date: Mon, 6 Aug 2018 21:54:32 +0200
From: Ard Biesheuvel <ard.biesheuvel@...aro.org>
To: Kees Cook <keescook@...omium.org>
Cc: Robin Murphy <robin.murphy@....com>, 
	Kernel Hardening <kernel-hardening@...ts.openwall.com>, Mark Rutland <mark.rutland@....com>, 
	Catalin Marinas <catalin.marinas@....com>, Will Deacon <will.deacon@....com>, 
	Christoffer Dall <christoffer.dall@....com>, 
	linux-arm-kernel <linux-arm-kernel@...ts.infradead.org>, 
	Laura Abbott <labbott@...oraproject.org>, Julien Thierry <julien.thierry@....com>
Subject: Re: [RFC/PoC PATCH 0/3] arm64: basic ROP mitigation

On 6 August 2018 at 21:50, Kees Cook <keescook@...omium.org> wrote:
> On Mon, Aug 6, 2018 at 12:35 PM, Ard Biesheuvel
> <ard.biesheuvel@...aro.org> wrote:
>> On 6 August 2018 at 20:49, Kees Cook <keescook@...omium.org> wrote:
>>> On Mon, Aug 6, 2018 at 10:45 AM, Robin Murphy <robin.murphy@....com> wrote:
>>>> I guess what I'm getting at is that if the protection mechanism is "always
>>>> return with SP outside TTBR1", there seems little point in going through the
>>>> motions if SP in TTBR0 could still be valid and allow an attack to succeed
>>>> anyway; this is basically just me working through a justification for saying
>>>> the proposed scheme needs "depends on ARM64_PAN || ARM64_SW_TTBR0_PAN",
>>>> making it that much uglier for v8.0 CPUs...
>>>
>>> I think anyone with v8.0 CPUs interested in this mitigation would also
>>> very much want PAN emulation. If a "depends on" isn't desired, what
>>> about "imply" in the Kconfig?
>>>
>>
>> Yes, but actually, using bit #0 is maybe a better alternative in any
>> case. You can never dereference SP with bit #0 set, regardless of
>> whether the address points to user or kernel space, and my concern
>> about reloading sp from x29 doesn't really make sense, given that x29
>> is always assigned from sp right after pushing x29 and x30 in the
>> function prologue, and sp only gets restored from x29 in the epilogue
>> when there is a stack frame to begin with, in which case we add #1 to
>> sp again before returning from the function.
>
> Fair enough! :)
>
>> The other code gets a lot cleaner as well.
>>
>> So for the return we'll have
>>
>>   ldp     x29, x30, [sp], #nn
>>>>add     sp, sp, #0x1
>>   ret
>>
>> and for the function call
>>
>>   bl      <foo>
>>>>mov      x30, sp
>>>>bic     sp, x30, #1
>>
>> The restore sequence in entry.s:96 (which has no spare registers) gets
>> much simpler as well:
>>
>> --- a/arch/arm64/kernel/entry.S
>> +++ b/arch/arm64/kernel/entry.S
>> @@ -95,6 +95,15 @@ alternative_else_nop_endif
>>          */
>>         add     sp, sp, x0      // sp' = sp + x0
>>         sub     x0, sp, x0      // x0' = sp' - x0 = (sp + x0) - x0 = sp
>> +#ifdef CONFIG_ARM64_ROP_SHIELD
>> +       tbnz    x0, #0, 1f
>> +       .subsection     1
>> +1:     sub     x0, x0, #1
>> +       sub     sp, sp, #1
>> +       b       2f
>> +       .previous
>> +2:
>> +#endif
>>         tbnz    x0, #THREAD_SHIFT, 0f
>>         sub     x0, sp, x0      // x0'' = sp' - x0' = (sp + x0) - sp = x0
>>         sub     sp, sp, x0      // sp'' = sp' - x0 = (sp + x0) - x0 = sp
>
> I get slightly concerned about "add" vs "clear bit", but I don't see a
> real way to chain a lot of "add"s to get to avoid the unaligned
> access. Is "or" less efficient than "add"?
>

Yes. The stack pointer is special on arm64, and can only be used with
a limited set of ALU instructions. So orring #1 would involve 'mov
<reg>, sp ; orr sp, <reg>, #1' like in the 'bic' case above, which
requires a scratch register as well.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.