kernel-hardening - Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged sandboxing

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180402003947.GE5586@cisco>
Date: Sun, 1 Apr 2018 18:39:47 -0600
From: Tycho Andersen <tycho@...ho.ws>
To: Mickaël Salaün <mic@...ikod.net>
Cc: Andy Lutomirski <luto@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	Alexei Starovoitov <ast@...nel.org>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Casey Schaufler <casey@...aufler-ca.com>,
	Daniel Borkmann <daniel@...earbox.net>,
	David Drysdale <drysdale@...gle.com>,
	"David S . Miller" <davem@...emloft.net>,
	"Eric W . Biederman" <ebiederm@...ssion.com>,
	James Morris <james.l.morris@...cle.com>,
	Jann Horn <jann@...jh.net>, Jonathan Corbet <corbet@....net>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Kees Cook <keescook@...omium.org>, Paul Moore <paul@...l-moore.com>,
	Sargun Dhillon <sargun@...gun.me>,
	"Serge E . Hallyn" <serge@...lyn.com>,
	Shuah Khan <shuah@...nel.org>, Tejun Heo <tj@...nel.org>,
	Thomas Graf <tgraf@...g.ch>, Will Drewry <wad@...omium.org>,
	Kernel Hardening <kernel-hardening@...ts.openwall.com>,
	Linux API <linux-api@...r.kernel.org>,
	LSM List <linux-security-module@...r.kernel.org>,
	Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next v8 00/11] Landlock LSM: Toward unprivileged
 sandboxing

Hi Mickaël,

On Mon, Apr 02, 2018 at 12:04:36AM +0200, Mickaël Salaün wrote:
> >> vDSO is a code mapped for all processes. As you said, these processes
> >> may use it or not. What I was thinking about is to use the same concept,
> >> i.e. map a "shim" code into each processes pertaining to a particular
> >> hierarchy (the same way seccomp filters are inherited across processes).
> >> With a seccomp filter matching some syscall (e.g. mount, open), it is
> >> possible to jump back to the shim code thanks to SECCOMP_RET_TRAP. This
> >> shim code should then be able to emulate/patch what is needed, even
> >> faking a file opening by receiving a file descriptor through a UNIX
> >> socket. As did the Chrome sandbox, the seccomp filter may look at the
> >> calling address to allow the shim code to call syscalls without being
> >> catched, if needed. However, relying on SIGSYS may not fit with
> >> arbitrary code. Using a new SECCOMP_RET_EMULATE (?) may be used to jump
> >> to a specific process address, to emulate the syscall in an easier way
> >> than only relying on a {c,e}BPF program.
> >>
> > 
> > This could indeed be done, but I think that Tycho's approach is much
> > cleaner and probably faster.
> > 
> 
> I like it too but how does this handle file descriptors?

I think it could be done fairly simply, the most complicated part is
probably designing an API that doesn't suck. But the basic idea would
be:

struct seccomp_notif_resp {
    __u64 id;
    __s32 error;
    __s64 val;
    __s32 fd;
};

if the handler responds with fd >= 0, we grab the tracer's fd,
duplicate it, and install it somewhere in the tracee's fd table. Since
things like socket() will want to return the fd number as its
installed and the handler doesn't know that, we'll probably want some
way to indicate that the kernel should return this value. We could
either mandate that if fd >= 0, that's the value that will be returned
from the syscall, or add another flag that says "no, install the fd,
but really return what's in val instead).

I guess we can't mandate that we return fd, because e.g. netlink
sockets can sometimes return fds as part of the netlink messages, and
not as the return value from the syscall.

Tycho

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.