kernel-hardening - Re: [PATCH 1/2] vmalloc: New flag for flush before releasing pages

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7b2ef6c657d2ab32c221f2ecbf69e8221e3dc844.camel@intel.com>
Date: Fri, 7 Dec 2018 03:06:22 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "keescook@...omium.org" <keescook@...omium.org>, "luto@...nel.org"
	<luto@...nel.org>, "nadav.amit@...il.com" <nadav.amit@...il.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"daniel@...earbox.net" <daniel@...earbox.net>, "ard.biesheuvel@...aro.org"
	<ard.biesheuvel@...aro.org>, "ast@...nel.org" <ast@...nel.org>,
	"rostedt@...dmis.org" <rostedt@...dmis.org>, "jeyu@...nel.org"
	<jeyu@...nel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>,
	"jannh@...gle.com" <jannh@...gle.com>, "Dock, Deneen T"
	<deneen.t.dock@...el.com>, "peterz@...radead.org" <peterz@...radead.org>,
	"kristen@...ux.intel.com" <kristen@...ux.intel.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"igor.stoppa@...il.com" <igor.stoppa@...il.com>, "tycho@...ho.ws"
	<tycho@...ho.ws>, "will.deacon@....com" <will.deacon@....com>,
	"mingo@...hat.com" <mingo@...hat.com>, "Keshavamurthy, Anil S"
	<anil.s.keshavamurthy@...el.com>, "kernel-hardening@...ts.openwall.com"
	<kernel-hardening@...ts.openwall.com>, "mhiramat@...nel.org"
	<mhiramat@...nel.org>, "naveen.n.rao@...ux.vnet.ibm.com"
	<naveen.n.rao@...ux.vnet.ibm.com>, "davem@...emloft.net"
	<davem@...emloft.net>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"Hansen, Dave" <dave.hansen@...el.com>
Subject: Re: [PATCH 1/2] vmalloc: New flag for flush before releasing pages

On Thu, 2018-12-06 at 15:08 -0800, Nadav Amit wrote:
> > On Dec 6, 2018, at 12:17 PM, Andy Lutomirski <luto@...nel.org> wrote:
> > 
> > On Thu, Dec 6, 2018 at 11:39 AM Nadav Amit <nadav.amit@...il.com> wrote:
> > > > On Dec 6, 2018, at 11:19 AM, Andy Lutomirski <luto@...nel.org> wrote:
> > > > 
> > > > On Thu, Dec 6, 2018 at 11:01 AM Tycho Andersen <tycho@...ho.ws> wrote:
> > > > > On Thu, Dec 06, 2018 at 10:53:50AM -0800, Andy Lutomirski wrote:
> > > > > > > If we are going to unmap the linear alias, why not do it at
> > > > > > > vmalloc()
> > > > > > > time rather than vfree() time?
> > > > > > 
> > > > > > That’s not totally nuts. Do we ever have code that expects __va() to
> > > > > > work on module data?  Perhaps crypto code trying to encrypt static
> > > > > > data because our APIs don’t understand virtual addresses.  I guess
> > > > > > if
> > > > > > highmem is ever used for modules, then we should be fine.
> > > > > > 
> > > > > > RO instead of not present might be safer.  But I do like the idea of
> > > > > > renaming Rick's flag to something like VM_XPFO or VM_NO_DIRECT_MAP
> > > > > > and
> > > > > > making it do all of this.
> > > > > 
> > > > > Yeah, doing it for everything automatically seemed like it was/is
> > > > > going to be a lot of work to debug all the corner cases where things
> > > > > expect memory to be mapped but don't explicitly say it. And in
> > > > > particular, the XPFO series only does it for user memory, whereas an
> > > > > additional flag like this would work for extra paranoid allocations
> > > > > of kernel memory too.
> > > > 
> > > > I just read the code, and I looks like vmalloc() is already using
> > > > highmem (__GFP_HIGH) if available, so, on big x86_32 systems, for
> > > > example, we already don't have modules in the direct map.
> > > > 
> > > > So I say we go for it.  This should be quite simple to implement --
> > > > the pageattr code already has almost all the needed logic on x86.  The
> > > > only arch support we should need is a pair of functions to remove a
> > > > vmalloc address range from the address map (if it was present in the
> > > > first place) and a function to put it back.  On x86, this should only
> > > > be a few lines of code.
> > > > 
> > > > What do you all think?  This should solve most of the problems we have.
> > > > 
> > > > If we really wanted to optimize this, we'd make it so that
> > > > module_alloc() allocates memory the normal way, then, later on, we
> > > > call some function that, all at once, removes the memory from the
> > > > direct map and applies the right permissions to the vmalloc alias (or
> > > > just makes the vmalloc alias not-present so we can add permissions
> > > > later without flushing), and flushes the TLB.  And we arrange for
> > > > vunmap to zap the vmalloc range, then put the memory back into the
> > > > direct map, then free the pages back to the page allocator, with the
> > > > flush in the appropriate place.
> > > > 
> > > > I don't see why the page allocator needs to know about any of this.
> > > > It's already okay with the permissions being changed out from under it
> > > > on x86, and it seems fine.  Rick, do you want to give some variant of
> > > > this a try?
> > > 
> > > Setting it as read-only may work (and already happens for the read-only
> > > module data). I am not sure about setting it as non-present.
> > > 
> > > At some point, a discussion about a threat-model, as Rick indicated, would
> > > be required. I presume ROP attacks can easily call
> > > set_all_modules_text_rw()
> > > and override all the protections.
> > 
> > I am far from an expert on exploit techniques, but here's a
> > potentially useful model: let's assume there's an attacker who can
> > write controlled data to a controlled kernel address but cannot
> > directly modify control flow.  It would be nice for such an attacker
> > to have a very difficult time of modifying kernel text or of
> > compromising control flow.  So we're assuming a feature like kernel
> > CET or that the attacker finds it very difficult to do something like
> > modifying some thread's IRET frame.
> > 
> > Admittedly, for the kernel, this is an odd threat model, since an
> > attacker can presumably quite easily learn the kernel stack address of
> > one of their tasks, do some syscall, and then modify their kernel
> > thread's stack such that it will IRET right back to a fully controlled
> > register state with RSP pointing at an attacker-supplied kernel stack.
> > So this threat model gives very strong ROP powers. unless we have
> > either CET or some software technique to harden all the RET
> > instructions in the kernel.
> > 
> > I wonder if there's a better model to use.  Maybe with stack-protector
> > we get some degree of protection?  Or is all of this is rather weak
> > until we have CET or a RAP-like feature.
> 
> I believe that seeing the end-goal would make reasoning about patches
> easier, otherwise the complaint “but anyhow it’s all insecure” keeps popping
> up.
> 
> I’m not sure CET or other CFI would be enough even with this threat-model.
> The page-tables (the very least) need to be write-protected, as otherwise
> controlled data writes may just modify them. There are various possible
> solutions I presume: write_rare for page-tables, hypervisor-assisted
> security to obtain physical level NX/RO (a-la Microsoft VBS) or some sort of
> hardware enclave.
> 
> What do you think?

I am not sure which issue you are talking about. I think there are actually two
separate issues that are merged discussions from overlap of fix for the teardown
W^X window.

For the W^X stuff I had originally imagined the protection was for when an
attacker has a limited bug that could write to a location in the module space,
but not other locations due to only having the ability to overwrite part of a
pointer or some something like that. Then the module could execute the new code
as it ran normally after finishing loading. So that is why I was wondering about
the RW window during load. Still seems generally sensible to enforce W^X though.

I like your idea about something like text_poke to load modules. I think maybe
my modules KASLR patchset could help the above somewhat too since it loads at a
freshly randomized address.

Since the issue with the freed pages before flush (the original source of this
thread) doesn't require a write bug to insert the code, but does require a way
to jump to it, its kind of the opposite model of the above. So that's why I
think they are different.

I am still learning lots on kernel exploits though, maybe Kees can provide some
better insight here?

Thanks,

Rick
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.