Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 4 Dec 2018 23:51:38 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "nadav.amit@...il.com" <nadav.amit@...il.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"daniel@...earbox.net" <daniel@...earbox.net>, "ard.biesheuvel@...aro.org"
	<ard.biesheuvel@...aro.org>, "jeyu@...nel.org" <jeyu@...nel.org>,
	"rostedt@...dmis.org" <rostedt@...dmis.org>, "ast@...nel.org"
	<ast@...nel.org>, "linux-mm@...ck.org" <linux-mm@...ck.org>, "Dock, Deneen T"
	<deneen.t.dock@...el.com>, "jannh@...gle.com" <jannh@...gle.com>,
	"kristen@...ux.intel.com" <kristen@...ux.intel.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"peterz@...radead.org" <peterz@...radead.org>, "will.deacon@....com"
	<will.deacon@....com>, "mingo@...hat.com" <mingo@...hat.com>,
	"luto@...nel.org" <luto@...nel.org>, "Keshavamurthy, Anil S"
	<anil.s.keshavamurthy@...el.com>, "kernel-hardening@...ts.openwall.com"
	<kernel-hardening@...ts.openwall.com>, "mhiramat@...nel.org"
	<mhiramat@...nel.org>, "naveen.n.rao@...ux.vnet.ibm.com"
	<naveen.n.rao@...ux.vnet.ibm.com>, "davem@...emloft.net"
	<davem@...emloft.net>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"Hansen, Dave" <dave.hansen@...el.com>
Subject: Re: [PATCH 1/2] vmalloc: New flag for flush before releasing pages

On Tue, 2018-12-04 at 12:36 -0800, Nadav Amit wrote:
> > On Dec 4, 2018, at 12:02 PM, Edgecombe, Rick P <rick.p.edgecombe@...el.com>
> > wrote:
> > 
> > On Tue, 2018-12-04 at 16:03 +0000, Will Deacon wrote:
> > > On Mon, Dec 03, 2018 at 05:43:11PM -0800, Nadav Amit wrote:
> > > > > On Nov 27, 2018, at 4:07 PM, Rick Edgecombe <
> > > > > rick.p.edgecombe@...el.com>
> > > > > wrote:
> > > > > 
> > > > > Since vfree will lazily flush the TLB, but not lazily free the
> > > > > underlying
> > > > > pages,
> > > > > it often leaves stale TLB entries to freed pages that could get re-
> > > > > used.
> > > > > This is
> > > > > undesirable for cases where the memory being freed has special
> > > > > permissions
> > > > > such
> > > > > as executable.
> > > > 
> > > > So I am trying to finish my patch-set for preventing transient W+X
> > > > mappings
> > > > from taking space, by handling kprobes & ftrace that I missed (thanks
> > > > again
> > > > for
> > > > pointing it out).
> > > > 
> > > > But all of the sudden, I don’t understand why we have the problem that
> > > > this
> > > > (your) patch-set deals with at all. We already change the mappings to
> > > > make
> > > > the memory writable before freeing the memory, so why can’t we make it
> > > > non-executable at the same time? Actually, why do we make the module
> > > > memory,
> > > > including its data executable before freeing it???
> > > 
> > > Yeah, this is really confusing, but I have a suspicion it's a combination
> > > of the various different configurations and hysterical raisins. We can't
> > > rely on module_alloc() allocating from the vmalloc area (see nios2) nor
> > > can we rely on disable_ro_nx() being available at build time.
> > > 
> > > If we *could* rely on module allocations always using vmalloc(), then
> > > we could pass in Rick's new flag and drop disable_ro_nx() altogether
> > > afaict -- who cares about the memory attributes of a mapping that's about
> > > to disappear anyway?
> > > 
> > > Is it just nios2 that does something different?
> > > 
> > > Will
> > 
> > Yea it is really intertwined. I think for x86, set_memory_nx everywhere
> > would
> > solve it as well, in fact that was what I first thought the solution should
> > be
> > until this was suggested. It's interesting that from the other thread Masami
> > Hiramatsu referenced, set_memory_nx was suggested last year and would have
> > inadvertently blocked this on x86. But, on the other architectures I have
> > since
> > learned it is a bit different.
> > 
> > It looks like actually most arch's don't re-define set_memory_*, and so all
> > of
> > the frob_* functions are actually just noops. In which case allocating RWX
> > is
> > needed to make it work at all, because that is what the allocation is going
> > to
> > stay at. So in these archs, set_memory_nx won't solve it because it will do
> > nothing.
> > 
> > On x86 I think you cannot get rid of disable_ro_nx fully because there is
> > the
> > changing of the permissions on the directmap as well. You don't want some
> > other
> > caller getting a page that was left RO when freed and then trying to write
> > to
> > it, if I understand this.
> > 
> > The other reasoning was that calling set_memory_nx isn't doing what we are
> > actually trying to do which is prevent the pages from getting released too
> > early.
> > 
> > A more clear solution for all of this might involve refactoring some of the
> > set_memory_ de-allocation logic out into __weak functions in either modules
> > or
> > vmalloc. As Jessica points out in the other thread though, modules does a
> > lot
> > more stuff there than the other module_alloc callers. I think it may take
> > some
> > thought to centralize AND make it optimal for every
> > module_alloc/vmalloc_exec
> > user and arch.
> > 
> > But for now with the change in vmalloc, we can block the executable mapping
> > freed page re-use issue in a cross platform way.
> 
> Please understand me correctly - I didn’t mean that your patches are not
> needed.
Ok, I think I understand. I have been pondering these same things after Masami
Hiramatsu's comments on this thread the other day.

> All I did is asking - how come the PTEs are executable when they are cleared
> they are executable, when in fact we manipulate them when the module is
> removed.
I think the directmap used to be RWX so maybe historically its trying to return
it to its default state? Not sure.

> I think I try to deal with a similar problem to the one you encounter -
> broken W^X. The only thing that bothered me in regard to your patches (and
> only after I played with the code) is that there is still a time-window in
> which W^X is broken due to disable_ro_nx().
> 
Totally agree there is overlap in the fixes and we should sync.

What do you think about Andy's suggestion for doing the vfree cleanup in vmalloc
with arch hooks? So the allocation goes into vfree fully setup and vmalloc frees
it and on x86 resets the direct map.

Thanks,

Rick

Powered by blists - more mailing lists

Your e-mail address:

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.