oss-security - Re: Linux: Disabling network namespaces

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240421200625.GA16869@openwall.com>
Date: Sun, 21 Apr 2024 22:06:25 +0200
From: Solar Designer <solar@...nwall.com>
To: oss-security@...ts.openwall.com
Subject: Re: Linux: Disabling network namespaces

On Sat, Apr 20, 2024 at 09:33:07PM +0000, Jordan Glover wrote:
> bubblwrap has --disable-userns option which prevents creation of nested namespaces (from manpage):
> 
>        --disable-userns
> Prevent the process in the sandbox from creating further user namespaces, so that it cannot rearrange the filesystem namespace or do other more complex namespace modification. This is currently implemented by setting the user.max_user_namespaces sysctl to 1, and then entering a nested user namespace which is unable to raise that limit in the outer namespace. This option requires --unshare-user, and doesn't work in the setuid version of bubblewrap.
> 
> Flatpak uses this (or seccomp filter) to block nested namespaces as this can bypass security its design. For this reason firefox own sandbox doesn't use namespaces in flatpak, see https://bugzilla.mozilla.org/show_bug.cgi?id=1756236

Thanks, I didn't expect it was this advanced already.

In what exact way would nested namespaces bypass the security design of
Flatpak?  Is this about the kernel's attack surface exposed by
capabilities in a namespace or something else?  I guess capabilities are
also dropped in the nested namespace?

After reviewing some kernel code, I have doubts as to how effective the
dropping of capabilities in a namespace actually is.

security/commoncap.c: cap_capable() includes this:

                /*
                 * The owner of the user namespace in the parent of the
                 * user namespace has all caps.
                 */
                if ((ns->parent == cred->user_ns) && uid_eq(ns->owner, cred->euid))
                        return 0;

this check is only reached when cap_capable() is called for a target
namespace other than one the credentials are from.  However, such uses
do exist, e.g. via Netlink, which would expose e.g. Netfilter:

net/netlink/af_netlink.c:

/**
 * netlink_net_capable - Netlink network namespace message capability test
 * @skb: socket buffer holding a netlink command from userspace
 * @cap: The capability to use
 *
 * Test to see if the opener of the socket we received the message
 * from had when the netlink socket was created and the sender of the
 * message has the capability @cap over the network namespace of
 * the socket we received the message from.
 */
bool netlink_net_capable(const struct sk_buff *skb, int cap)
{
        return netlink_ns_capable(skb, sock_net(skb->sk)->user_ns, cap);
}

So I worry whether even with all namespaces in a sandbox having dropped
capabilities, an attack can still be arranged (with a pair of namespaces
one nested in the other) where a task effectively "has all caps" for a
dangerous operation like configuring Netfilter due to it hitting code
paths like this, which bypass capability bit checks.

The above finding may be a reason for us to prefer making capabilities
in a namespace ineffective vs. dropping capabilities.  In context of my
idea/proposal for a new sysctl, it could be better for it to work as I
had described, overriding security_capable() return, instead of e.g.
hooking return of create_user_ns() and dropping new cred's capabilities.

I hope the Ubuntu/AppArmor solution is also safe in this respect, as it
sounds like it similarly makes capabilities ineffective instead of
dropping them.

Alexander

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.