kernel-hardening - Re: Re: [RFC v2][PATCH 04/11] x86: Implement __arch_rare_write

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrX+iQVjupq9NU5kOPypBBOSRziuvdGdnzCxTUXQkcFJcQ@mail.gmail.com>
Date: Sun, 9 Apr 2017 17:31:36 -0700
From: Andy Lutomirski <luto@...nel.org>
To: PaX Team <pageexec@...email.hu>
Cc: Daniel Micay <danielmicay@...il.com>, Andy Lutomirski <luto@...nel.org>, 
	Mathias Krause <minipli@...glemail.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Kees Cook <keescook@...omium.org>, 
	"kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>, Mark Rutland <mark.rutland@....com>, 
	Hoeun Ryu <hoeun.ryu@...il.com>, Emese Revfy <re.emese@...il.com>, 
	Russell King <linux@...linux.org.uk>, X86 ML <x86@...nel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, 
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, 
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: Re: [RFC v2][PATCH 04/11] x86: Implement __arch_rare_write_begin/unmap()

On Sun, Apr 9, 2017 at 1:24 PM, PaX Team <pageexec@...email.hu> wrote:
>
>> In the context of virtually mapped stacks / KSTACKOVERFLOW, this
>> naturally leads to different solutions.  The upstream kernel had a
>> bunch of buggy drivers that played badly with virtually mapped stacks.
>> grsecurity sensibly went for the approach where the buggy drivers kept
>> working.  The upstream kernel went for the approach of fixing the
>> drivers rather than keeping a compatibility workaround.  Different
>> constraints, different solutions.
>
> except that's not what happened at all. spender's first version did just
> a vmalloc for the kstack like the totally NIH'd version upstream does
> now. while we always anticipated buggy dma users and thus had code that
> would detect them so that we could fix them, we quickly figured that the
> upstream kernel wasn't quite up to snuff as we had assumed and faced with
> the amount of buggy code, we went for the current vmap approach which
> kept users' systems working instead of breaking them.
>
> you're trying to imply that upstream fixed the drivers but as the facts
> show, that's not true. you simply unleashed your code on the world and
> hoped(?) that enough suckers would try it out during the -rc window. as
> we all know several releases and almost a year later, that was a losing
> bet as you still keep fixing those drivers (and something tells me that
> we haven't seen the end of it). this is simply irresponsible engineering
> for no technical reason.

I consider breaking buggy drivers (in a way that they either generally
work okay or that they break with a nice OOPS depending on config) to
be better than having a special case in what's supposed to be a fast
path to keep them working.  I did consider forcing the relevant debug
options on for a while just to help shake these bugs out the woodwork
faster.

>
>> In the case of rare writes or pax_open_kernel [1] or whatever we want
>> to call it, CR3 would work without arch-specific code, and CR0 would
>> not.  That's an argument for CR3 that would need to be countered by
>> something.  (Sure, avoiding leaks either way might need arch changes.
>> OTOH, a *randomized* CR3-based approach might not have as much of a
>> leak issue to begin with.)
>
> i have yet to see anyone explain what they mean by 'leak' here but if it
> is what i think it is then the arch specific entry/exit changes are not
> optional but mandatory. see below for randomization.

By "leak" I mean that a bug or exploit causes unintended code to run
with CR0.WP or a special CR3 or a special PTE or whatever loaded.  PaX
hooks the entry code to avoid leaks.

>> At boot, choose a random address A.
>
> what is the threat that a random address defends against?

Makes it harder to exploit a case where the CR3 setting leaks.

>
>>  Create an mm_struct that has a
>> single VMA starting at A that represents the kernel's rarely-written
>> section.  Compute O = (A - VA of rarely-written section).  To do a
>> rare write, use_mm() the mm, write to (VA + O), then unuse_mm().
>
> the problem is that the amount of __read_only data extends beyond vmlinux,
> i.e., this approach won't scale. another problem is that it can't be used
> inside use_mm and switch_mm themselves (no read-only task structs or percpu
> pgd for you ;) and probably several other contexts.

Can you clarify these uses that extend beyond vmlinux?  I haven't
looked at the grsecurity patch extensively.  Are you talking about the
BPF JIT stuff?  If so, I think that should possibly be handled a bit
differently, since I think the normal
write-to-rare-write-vmlinux-sections primitive should preferably *not*
be usable to write to executable pages.  Using a real mm_struct for
this could help.

>
> last but not least, use_mm says this about itself:
>
>     (Note: this routine is intended to be called only
>     from a kernel thread context)
>
> so using it will need some engineering (or the comment be fixed).

Indeed.

>> It has the added benefit that writes to non-rare-write data using the
>> rare-write primitive will fail.
>
> what is the threat model you're assuming for this feature? based on what i
> have for PaX (arbitrary read/write access exploited for data-only attacks),
> the above makes no sense to me...
>

If I use the primitive to try to write a value to the wrong section
(write to kernel text, for example), IMO it would be nice to OOPS
instead of succeeding.

Please keep in mind that, unlike PaX, uses of a pax_open_kernel()-like
function will may be carefully audited by a friendly security expert
such as yourself.  It would be nice to harden the primitive to a
reasonable extent against minor misuses such as putting it in a
context where the compiler will emit mov-a-reg-with-WP-set-to-CR0;
ret.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.