Date: Mon, 6 Nov 2017 21:28:02 -0600 From: "Serge E. Hallyn" <serge@...lyn.com> To: Boris Lukashev <blukashev@...pervictus.com> Cc: "Serge E. Hallyn" <serge@...lyn.com>, Daniel Micay <danielmicay@...il.com>, Mahesh Bandewar (महेश बंडेवार) <maheshb@...gle.com>, Mahesh Bandewar <mahesh@...dewar.net>, LKML <linux-kernel@...r.kernel.org>, Netdev <netdev@...r.kernel.org>, Kernel-hardening <kernel-hardening@...ts.openwall.com>, Linux API <linux-api@...r.kernel.org>, Kees Cook <keescook@...omium.org>, "Eric W . Biederman" <ebiederm@...ssion.com>, Eric Dumazet <edumazet@...gle.com>, David Miller <davem@...emloft.net> Subject: Re: Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces On Mon, Nov 06, 2017 at 07:01:58PM -0500, Boris Lukashev wrote: > On Mon, Nov 6, 2017 at 6:39 PM, Serge E. Hallyn <serge@...lyn.com> wrote: > > Quoting Boris Lukashev (blukashev@...pervictus.com): > >> On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn <serge@...lyn.com> wrote: > >> > Quoting Daniel Micay (danielmicay@...il.com): > >> >> Substantial added attack surface will never go away as a problem. There > >> >> aren't a finite number of vulnerabilities to be found. > >> > > >> > There's varying levels of usefulness and quality. There is code which I > >> > want to be able to use in a container, and code which I can't ever see a > >> > reason for using there. The latter, especially if it's also in a > >> > staging driver, would be nice to have a toggle to disable. > >> > > >> > You're not advocating dropping the added attack surface, only adding a > >> > way of dealing with an 0day after the fact. Privilege raising 0days can > >> > exist anywhere, not just in code which only root in a user namespace can > >> > exercise. So from that point of view, ksplice seems a more complete > >> > solution. Why not just actually fix the bad code block when we know > >> > about it? > >> > > >> > Finally, it has been well argued that you can gain many new caps from > >> > having only a few others. Given that, how could you ever be sure that, > >> > if an 0day is found which allows root in a user ns to abuse > >> > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them > >> > would suffice? It seems to me that the existing control in > >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape > >> > in that case. > >> > > >> > -serge > >> > >> This seems to be heading toward "we need full zones in Linux" with > >> their own procfs and sysfs namespace and a stricter isolation model > >> for resources and capabilities. So long as things can happen in a > >> namespace which have a privileged relationship with host resources, > >> this is going to be cat-and-mouse to one degree or another. > >> > >> Containers and namespaces dont have a one-to-one relationship, so i'm > >> not sure that's the best term to use in the kernel security context > > > > Sorry - what's not the best term to use? > > Pardon, "containers," since they're namespaces+system construct. > > > > >> since there's a bunch of userspace and implementation delta across the > >> different systems (with their own security models and so forth). > >> Without accounting for what a specific implementation may or may not > >> do, and only looking at "how do we reduce privileged impact on parent > >> context from unprivileged namespaces," this patch does seem to provide > >> a logical way of reducing the privileges available in such a namespace > >> and often needed to mount escapes/impact parent context. > > > > What different implementations do is irrelevant - as an unprivileged user > > I can always, with no help, create a new user namespace mapping my current > > uid to root, and exercise this code. So the security model implemented > > by a particular userspace namespace-using driver doesn't matter, as it > > only restricts me if I choose to use it. > > > > But, I guess you're actually saying that some program might know that it > > should never use network code so want to drop CAP_NET_*? And you're > > saying that a "global capability bounding set" might be useful? > > > > The "global capability bounding set" with forced inheritance can be > used to prevent the vector you describe wherein the capability of UID > 0 in the child NS is restricted from the parent implicitly, so yes, > that nomenclature seems appropriate. > > > Would it be better to actually implement it as a new bounding set that > > is maintained across user namespace creations, but is per-task (inherted > > by children of course)? Instead of a sysctl? > > > > -serge > > In line with the previous comment, the inheritance across subsequent > invocations should be forced to prevent the context you described. > Please pardon my ignorance, not sure what you mean in terms of > "per-task" across namespace creation. I meant each task has a perm_cap_bset next to the cap_bset. So task p1 (if it has privilege) can drop CAP_SYS_ADMIN from perm_cap_bset, p2 (if it has privilege) can drop CAP_NET_ADMIN. When p1 creates a new user_ns, that init task has its cap_bset set to all caps but CAP_SYS_ADMIN. I think for simplicity perm_cap_bset would *only* affect the filling of cap_bset at user namespace creation. So if you wanted to drop a capability from your own cap_bset as well, you'd have to do that separately.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.