Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 29 Apr 2016 01:45:25 +0200
From: Mickaël Salaün <>
To: Kees Cook <>
Cc: linux-security-module <>,
 Andreas Gruenbacher <>, Andy Lutomirski
 <>, Arnd Bergmann <>,
 Casey Schaufler <>,
 Daniel Borkmann <>, David Drysdale
 <>, Eric Paris <>,
 James Morris <>, Julien Tinnes <>,
 Michael Kerrisk <>, Paul Moore <>,
 "Serge E . Hallyn" <>, Stephen Smalley <>,
 Tetsuo Handa <>,
 Will Drewry <>, Linux API <>,
 "" <>
Subject: Re: [RFC v1 00/17] seccomp-object: From attack surface reduction to

Thanks for the comments. Here are mine:

On 28/04/2016 04:36, Kees Cook wrote:
> Okay, I've read through this whole series now (sorry for the huge
> delay). I think that it is overly complex for what it results in
> providing. Here are some background thoughts I had:

It may be a bit complex but my goal was to create a generic framework easily extensible in the future.

> 1) People have asked for "dereferenced argument inspection" (I will
> call this DAI...), in that they would like to be able to process
> arguments like how BPF traditionally processes packets. This series
> doesn't provide that. Rather, it provides static checks against
> specific arguments types (currently just path checks).

The thing is, a network packet can be filtered based on some basic type checks (e.g. integer, bit fields, enum) but it seems really complex to be able to check for stuff like file path (without even thinking about a BPF regex engine :).
However, the approach taken in this series is to allow complex checks based on a path *object* which could not be directly possible by inspecting the argument. Indeed, the kernel can expose more information than just a (user-controlled) string: parent directory, inode, device, mount point…

I think that the need is not to be able to (directly) dereference syscall arguments but to be able to evaluate arguments.

Moreover, exposing raw arguments in the seccomp_data struct can be tricky because of possible multiple pointer indirections.

Last but not least, giving the ability to a BPF to interpret syscall arguments can lead to inconsistent evaluation (incorrect mirroring of the OS code and state, cf.

However, another approach could be to expose a high-level (canonicalized path, inode values…) object in the seccomp_data struct but this seems more complex and less flexible. How to store path reference object? How to check path hierarchy?

> 2) When I dig into the requirements people have around DAI, it's
> mostly about checking path names. There is some interest in some of
> the network structures, but mostly it's path names. This series
> certainly underscores this since your first example is path names. :)

Indeed. The same approach could be used to filter arguments as socket.

> 3) Solving ToCToU should also solve performance problems. For example,
> this series, on a successful syscall, will look up a pathname twice
> (once in seccomp, then again in the syscall, and then compares the
> results in the LSM as a ToCToU back-stop). This seems like a waste of
> effort, since this reimplements the work the kernel is already doing
> to pass the resulting structure to the LSM hooks. As such, since this
> series is doing static checks and not allowing byte processing for
> DAI, I'm convinced that it should entirely happen in the LSM hooks.

This could be misleading. This series use the audit cache to not evaluate multiple path (and prevent ToCToU). I think there is then no more penalty than a syscall using multiple times the same file descriptor. If the checked file is not modified by another process, the file (path) cache check is only an integer/address comparison.

According to the current seccomp and syscall workflow in Linux, I don't see any other way to check an argument without modifying the current kernel behavior (e.g. ptrace hook) or locking resources (i.e. file).

> 4) Performing the checks in the LSM hooks carries a risk of exposing
> the syscall's argument processing code to an attacker, but I think
> that is okay since very similar code would already need to be written
> to do the same thing before entering the syscall. The only way out of
> this, I think, would be to standardize syscall argument processing.

I created a basic, but generic and non-intrusive, syscall argument table to store useful types (e.g. int, char *) but standardizing syscall argument processing seems a big and long term task. However, using a cache similar to audit for some argument types seems more realistic ;)

> 5) If we can standardize syscall argument processing, we could also
> change when it happens, and retain the results for the syscall,
> allowing for full byte processing style of DAI. e.g. copy userspace to
> kernel space, do BPF on the argument, if okay, pass the kernel copy to
> the syscall where it continues the processing. If the kernel copy
> wasn't already created by seccomp, the syscall would just make that
> copy itself, etc.

Cf. my first comment and the mirroring code problem (and complexity).

> So, I see DAI as going one of two ways:
> a) rewrite all syscall entry to use a common cacheable argument parser
> and offering true BPF processing of the argument bytes.
> b) use the existing LSM hooks and define a policy language that can be
> loaded ahead of time.
> Doing "a" has many problems, I think. Not the least of which is that I
> can't imagine a way for such an architectural change to not have
> negative performance impacts for the regular case.

Agree :)

> Doing "b" means writing a policy engine. I would expect it to look a
> lot like either AppArmor or TOMOYO. TOMOYO has network structure
> processing, so probably it would look more like TOMOYO if you wanted
> more than just file paths. Maybe a seccomp LSM could share logic from
> one of the existing path-based LSMs.

An interesting thing about BPF is that it's already an engine and would be interesting to do more than denying accesses but, for example, faking syscall return values as it is already possible. Moreover, this keeps a small attack surface.

> Another note I had for this series was that because the checker tries
> to keep a cached struct path, it allows unprivileged users to check
> for path names existing or not, regardless of the user's permissions.

The registration of a new checker (SECCOMP_ADD_CHECKER_GROUP) against a file path should only be allowed according to the current user permissions, and only a filter from the same thread can checks against this same file path. So I don't see how this series allows unprivileged users to check for path names regardless of their permissions.

> Instead, you have to check the path against the policy each time.

Well, each time doesn't means a path parsing but a cache comparison (like does the kernel anyway).

> AppArmor does this efficiently with a pre-built deterministic finite
> automatons (built from regular expressions), and TOMOYO just does
> string compares and limited glob parsing every time.
> So, I can't take this as-is, but I'll take the one fix near the start.
> :) I hope this isn't too discouraging, since I'd love to see this
> solved. Hopefully you can keep chipping away at it!

Of course this series is not ready as-is, but I'm convinced the main ideas could make happy sandbox developers!


Download attachment "signature.asc" of type "application/pgp-signature" (456 bytes)

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ