kernel-hardening - Re: [PATCH 10/17] prmem: documentation

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrW1FcT88u0QY9k_PMOeZj0G8H6KNb5Bnreo-12NvmmCEQ@mail.gmail.com>
Date: Tue, 30 Oct 2018 21:41:13 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Matthew Wilcox <willy@...radead.org>
Cc: Igor Stoppa <igor.stoppa@...il.com>, Tycho Andersen <tycho@...ho.ws>, 
	Kees Cook <keescook@...omium.org>, Peter Zijlstra <peterz@...radead.org>, 
	Mimi Zohar <zohar@...ux.vnet.ibm.com>, Dave Chinner <david@...morbit.com>, 
	James Morris <jmorris@...ei.org>, Michal Hocko <mhocko@...nel.org>, 
	Kernel Hardening <kernel-hardening@...ts.openwall.com>, 
	linux-integrity <linux-integrity@...r.kernel.org>, 
	LSM List <linux-security-module@...r.kernel.org>, 
	Igor Stoppa <igor.stoppa@...wei.com>, Dave Hansen <dave.hansen@...ux.intel.com>, 
	Jonathan Corbet <corbet@....net>, Laura Abbott <labbott@...hat.com>, 
	Randy Dunlap <rdunlap@...radead.org>, Mike Rapoport <rppt@...ux.vnet.ibm.com>, 
	"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 10/17] prmem: documentation

On Tue, Oct 30, 2018 at 2:36 PM Matthew Wilcox <willy@...radead.org> wrote:
>
> On Tue, Oct 30, 2018 at 10:43:14PM +0200, Igor Stoppa wrote:
> > On 30/10/2018 21:20, Matthew Wilcox wrote:
> > > > > So the API might look something like this:
> > > > >
> > > > >         void *p = rare_alloc(...);      /* writable pointer */
> > > > >         p->a = x;
> > > > >         q = rare_protect(p);            /* read-only pointer */
> >
> > With pools and memory allocated from vmap_areas, I was able to say
> >
> > protect(pool)
> >
> > and that would do a swipe on all the pages currently in use.
> > In the SELinux policyDB, for example, one doesn't really want to
> > individually protect each allocation.
> >
> > The loading phase happens usually at boot, when the system can be assumed to
> > be sane (one might even preload a bare-bone set of rules from initramfs and
> > then replace it later on, with the full blown set).
> >
> > There is no need to process each of these tens of thousands allocations and
> > initialization as write-rare.
> >
> > Would it be possible to do the same here?
>
> What Andy is proposing effectively puts all rare allocations into
> one pool.  Although I suppose it could be generalised to multiple pools
> ... one mm_struct per pool.  Andy, what do you think to doing that?

Hmm.  Let's see.

To clarify some of this thread, I think that the fact that rare_write
uses an mm_struct and alias mappings under the hood should be
completely invisible to users of the API.  No one should ever be
handed a writable pointer to rare_write memory (except perhaps during
bootup or when initializing a large complex data structure that will
be rare_write but isn't yet, e.g. the policy db).

For example, there could easily be architectures where having a
writable alias is problematic.  On such architectures, an entirely
different mechanism might work better.  And, if a tool like KNOX ever
becomes a *part* of the Linux kernel (hint hint!)

If you have multiple pools and one mm_struct per pool, you'll need a
way to find the mm_struct from a given allocation.  Regardless of how
the mm_structs are set up, changing rare_write memory to normal memory
or vice versa will require a global TLB flush (all ASIDs and global
pages) on all CPUs, so having extra mm_structs doesn't seem to buy
much.

(It's just possible that changing rare_write back to normal might be
able to avoid the flush if the spurious faults can be handled
reliably.)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.