kernel-hardening - Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160830205552.GB71063@ast-mbp.thefacebook.com>
Date: Tue, 30 Aug 2016 13:55:55 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Mickaël Salaün <mic@...ikod.net>
Cc: Andy Lutomirski <luto@...capital.net>,
	"kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>,
	Alexei Starovoitov <ast@...nel.org>, Tejun Heo <tj@...nel.org>,
	Sargun Dhillon <sargun@...gun.me>,
	Network Development <netdev@...r.kernel.org>,
	Linux API <linux-api@...r.kernel.org>,
	Kees Cook <keescook@...omium.org>,
	LSM List <linux-security-module@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>,
	"David S . Miller" <davem@...emloft.net>,
	Daniel Mack <daniel@...que.org>,
	Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

On Tue, Aug 30, 2016 at 10:33:31PM +0200, Mickaël Salaün wrote:
> 
> 
> On 30/08/2016 22:23, Andy Lutomirski wrote:
> > On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün <mic@...ikod.net> wrote:
> >>
> >> On 30/08/2016 20:55, Andy Lutomirski wrote:
> >>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün <mic@...ikod.net> wrote:
> >>>>
> >>>>
> >>>> On 28/08/2016 10:13, Andy Lutomirski wrote:
> >>>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün" <mic@...ikod.net> wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
> >>>>>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> >>>>>>>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> >>>>>>>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> >>>>>>>>>> As said above, Landlock will not run an eBPF programs when not strictly
> >>>>>>>>>> needed. Attaching to a cgroup will have the same performance impact as
> >>>>>>>>>> attaching to a process hierarchy.
> >>>>>>>>>
> >>>>>>>>> Having a prog per cgroup per lsm_hook is the only scalable way I
> >>>>>>>>> could come up with. If you see another way, please propose.
> >>>>>>>>> current->seccomp.landlock_prog is not the answer.
> >>>>>>>>
> >>>>>>>> Hum, I don't see the difference from a performance point of view between
> >>>>>>>> a cgroup-based or a process hierarchy-based system.
> >>>>>>>>
> >>>>>>>> Maybe a better option should be to use an array of pointers with N
> >>>>>>>> entries, one for each supported hook, instead of a unique pointer list?
> >>>>>>>
> >>>>>>> yes, clearly array dereference is faster than link list walk.
> >>>>>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> >>>>>>> Since we cannot keep it inside task_struct, we have to allocate it.
> >>>>>>> Every time the task is creted then. What to do on the fork? That
> >>>>>>> will require changes all over. Then the obvious optimization would be
> >>>>>>> to share this allocated array of prog pointers across multiple tasks...
> >>>>>>> and little by little this new facility will look like cgroup.
> >>>>>>> Hence the suggestion to put this array into cgroup from the start.
> >>>>>>
> >>>>>> I see your point :)
> >>>>>>
> >>>>>>>
> >>>>>>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
> >>>>>>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
> >>>>>>>> to use a process hierarchy). The downside will be to handle an LSM hook
> >>>>>>>> program which is not triggered by a seccomp-filter, but this should be
> >>>>>>>> needed anyway to handle interruptions.
> >>>>>>>
> >>>>>>> what do you mean 'not triggered by seccomp' ?
> >>>>>>> You're not suggesting that this lsm has to enable seccomp to be functional?
> >>>>>>> imo that's non starter due to overhead.
> >>>>>>
> >>>>>> Yes, for now, it is triggered by a new seccomp filter return value
> >>>>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
> >>>>>> be needed but could be useful to bind a seccomp filter security policy
> >>>>>> with a Landlock one. Waiting for Kees's point of view…
> >>>>>>
> >>>>>
> >>>>> I'm not Kees, but I'd be okay with that.  I still think that doing
> >>>>> this by process hierarchy a la seccomp will be easier to use and to
> >>>>> understand (which is quite important for this kind of work) than doing
> >>>>> it by cgroup.
> >>>>>
> >>>>> A feature I've wanted to add for a while is to have an fd that
> >>>>> represents a seccomp layer, the idea being that you would set up your
> >>>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
> >>>>> would have a syscall to install that layer.  Then an unprivileged
> >>>>> sandbox manager could set up its layer and still be able to inject new
> >>>>> processes into it later on, no cgroups needed.
> >>>>
> >>>> A nice thing I didn't highlight about Landlock is that a process can
> >>>> prepare a layer of rules (arraymap of handles + Landlock programs) and
> >>>> pass the file descriptors of the Landlock programs to another process.
> >>>> This process could then apply this programs to get sandboxed. However,
> >>>> for now, because a Landlock program is only triggered by a seccomp
> >>>> filter (which do not follow the Landlock programs as a FD), they will be
> >>>> useless.
> >>>>
> >>>> The FD referring to an arraymap of handles can also be used to update a
> >>>> map and change the behavior of a Landlock program. A master process can
> >>>> then add or remove restrictions to another process hierarchy on the fly.
> >>>
> >>> Maybe this could be extended a little bit.  The fd could hold the
> >>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
> >>> the ability to install it and FMODE_WRITE could give the ability to
> >>> modify it.
> >>>
> >>
> >> This is interesting! It should be possible to append the seccomp stack
> >> of a source process to the seccomp stack of the target process when a
> >> Landlock program is passed and then activated through seccomp(2).
> >>
> >> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
> >> permission of the eBPF program FD in a specific way?
> >>
> > 
> > This wouldn't be an eBPF program FD -- it would be an FD encapsulating
> > an entire configuration including seccomp BPF program, whatever
> > landlock stuff is associated, and eventual seccomp monitor
> > configuration (once I write that code), etc.
> > 
> > You wouldn't say "attach this process's seccomp stack to me" -- you'd
> > say "attach this seccomp layer to me".
> > 
> > A decision that we'd have to make would be whether the FD links to the
> > parent layer or whether it can be attached without regard to what the
> > parent layer is.
> 
> OK, I like that, but I think it could be done on a second time. :)

I don't. Single FD that is a collection of objects seems an odd abstraction
to me. I also don't see what it actually solves.
I think lsm and seccomp should be orthogonal and not tied into each other.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.