kernel-hardening - Re: Re: [RFC PATCH 6/6] arm64: add VMAP

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKv+Gu-FvfPFQooCie6HwP=mBng3C0jp9p8WMkFwTxctDu4JBA@mail.gmail.com>
Date: Fri, 14 Jul 2017 11:48:20 +0100
From: Ard Biesheuvel <ard.biesheuvel@...aro.org>
To: Mark Rutland <mark.rutland@....com>
Cc: Kernel Hardening <kernel-hardening@...ts.openwall.com>, 
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Takahiro Akashi <akashi.takahiro@...aro.org>, 
	Catalin Marinas <catalin.marinas@....com>, Dave Martin <dave.martin@....com>, 
	James Morse <james.morse@....com>, Laura Abbott <labbott@...oraproject.org>, 
	Will Deacon <will.deacon@....com>, Kees Cook <keescook@...omium.org>
Subject: Re: Re: [RFC PATCH 6/6] arm64: add VMAP_STACK and
 detect out-of-bounds SP

On 14 July 2017 at 11:32, Mark Rutland <mark.rutland@....com> wrote:
> On Thu, Jul 13, 2017 at 07:28:48PM +0100, Ard Biesheuvel wrote:
>> On 13 July 2017 at 18:55, Mark Rutland <mark.rutland@....com> wrote:
>> > On Thu, Jul 13, 2017 at 05:10:50PM +0100, Mark Rutland wrote:
>> >> On Thu, Jul 13, 2017 at 12:49:48PM +0100, Ard Biesheuvel wrote:
>> >> > On 13 July 2017 at 11:49, Mark Rutland <mark.rutland@....com> wrote:
>> >> > > On Thu, Jul 13, 2017 at 07:58:50AM +0100, Ard Biesheuvel wrote:
>> >> > >> On 12 July 2017 at 23:33, Mark Rutland <mark.rutland@....com> wrote:
>> >
>> >> > Given that the very first stp in kernel_entry will fault if we have
>> >> > less than S_FRAME_SIZE bytes of stack left, I think we should check
>> >> > that we have at least that much space available.
>> >>
>> >> I was going to reply saying that I didn't agree, but in writing up
>> >> examples, I mostly convinced myself that this is the right thing to do.
>> >> So I mostly agree!
>> >>
>> >> This would mean we treat the first impossible-to-handle exception as
>> >> that fatal case, which is similar to x86's double-fault, triggered when
>> >> the HW can't stack the regs. All other cases are just arbitrary faults.
>> >>
>> >> However, to provide that consistently, we'll need to perform this check
>> >> at every exception boundary, or some of those cases will result in a
>> >> recursive fault first.
>> >>
>> >> So I think there are three choices:
>> >>
>> >> 1) In el1_sync, only check SP bounds, and live with the recursive
>> >>    faults.
>> >>
>> >> 2) in el1_sync, check there's room for the regs, and live with the
>> >>    recursive faults for overflow on other exceptions.
>> >>
>> >> 3) In all EL1 entry paths, check there's room for the regs.
>> >
>> > FWIW, for the moment I've applied (2), as you suggested, to my
>> > arm64/vmap-stack branch, adding an additional:
>> >
>> >         sub     x0, x0, #S_FRAME_SIZE
>> >
>> > ... to the entry path.
>> >
>> > I think it's worth trying (3) so that we consistently report these
>> > cases, benchmarks permitting.
>> >
>>
>> OK, so here's a crazy idea: what if we
>> a) carve out a dedicated range in the VMALLOC area for stacks
>> b) for each stack, allocate a naturally aligned window of 2x the stack
>> size, and map the stack inside it, leaving the remaining space
>> unmapped
>
> This is not such a crazy idea. :)
>
> In fact, it was one I toyed with before getting lost on a register
> juggling tangent (see below).
>
>> That way, we can compare SP (minus S_FRAME_SIZE) against a mask that
>> is a build time constant, to decide whether its value points into a
>> stack or not. Of course, it may be pointing into the wrong stack, but
>> that should not prevent us from taking the exception, and we can deal
>> with that later. It would give us a very cheap way to perform this
>> test on the hot paths.
>
> The logical ops (TST) and conditional branches (TB(N)Z, CB(N)Z) operate
> on XZR rather than SP, so to do this we need to get the SP value into a
> GPR.
>
> Previously, I assumed this meant we needed to corrupt a GPR (and hence
> stash that GPR in a sysreg), so I started writing code to free sysregs.
>
> However, I now realise I was being thick, since we can stash the GPR
> in the SP:
>
>         sub     sp, sp, x0      // sp = orig_sp - x0
>         add     x0, sp, x0      // x0 = x0 - (orig_sp - x0) == orig_sp
>         sub     x0, x0, #S_FRAME_SIZE
>         tb(nz)  x0, #THREAD_SHIFT, overflow
>         add     x0, x0, #S_FRAME_SIZE
>         sub     x0, sp, x0
>         add     sp, sp, x0
>
> ... so yes, this could work!
>

Nice!

> This means that we have to align the initial task, so the kernel Image
> will grow by THREAD_SIZE. Likewise for IRQ stacks, unless we can rework
> things such that we can dynamically allocate all of those.
>

We can't currently do that for 64k pages, since the segment alignment
is only 64k. But we should be able to patch that up I think

>> >> I believe that determining whether the exception was caused by a stack
>> >> overflow is not something we can do robustly or efficiently.
>>
>> Actually, if the stack pointer is within S_FRAME_SIZE of the base, and
>> the faulting address points into the guard page, that is a pretty
>> strong indicator that the stack overflowed. That shouldn't be too
>> costly?
>
> Sure, but that's still a a heuristic. For example, that also catches an
> unrelated vmalloc address gone wrong, while SP was close to the end of
> the stack.
>

Yes, but the likelihood that an unrelated stray vmalloc access is
within 16 KB of a stack pointer that is close ot its limit is
extremely low, so we should be able to live with the risk of
misidentifying it.

> The important thing is whether we can *safely enter the exception* (i.e.
> stack the regs), or whether this'll push the SP (further) out-of-bounds.
> I think we agree that we can reliably and efficiently check this.
>

Yes.

> The general case of nominal "stack overflows" (e.g. large preidx
> decrements, proxied SP values, unrelated guard-page faults) is a
> semantic minefield. I don't think we should add code to try to
> distinguish these.
>
> For that general case, if we can enter the exception then we can try to
> handle the exception in the usual way, and either:
>
> * The fault code determines the access was bad. We at least kill the
>   thread.
>
> * We overflow the stack while trying to handle the exception, triggering
>   a new fault to triage.
>
> To make it possible to distinguish and debug these, we need to fix the
> backtracing code, but that's it.
>
> Thanks,
> Mark.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.