Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 18 Dec 2015 03:49:04 +0300
From: Solar Designer <>
Cc: Andrey Vagin <>, Sergey Bronnikov <>
Subject: Re: CVE Request: Linux kernel: privilege escalation in user namespaces

On Thu, Dec 17, 2015 at 02:39:58PM -0800, John Johansen wrote:
> Jann Horn reported a privilege escalation in user namespaces to the
> lkml mailing list
> if a root-owned process wants to enter a user
> namespace for some reason without knowing who owns it and
> therefore can't change to the namespace owner's uid and gid
> before entering, as soon as it has entered the namespace,
> the namespace owner can attach to it via ptrace and thereby
> gain access to its uid and gid.

This appears related:

Back in 2005, I performed a security audit of soon-to-be-released
OpenVZ.  (It was very nice of SWsoft/Parallels to put the effort and
funding into this before making the project public.)  The security audit
report has finally been made public here:

As I recall, one of the changes OpenVZ developers made during the audit,
in response to very early findings, was prevent a process entering a
container (called VPS or VE at the time, for Virtual Environment) from
being ptrace'd by a process already running in the container.  This
scenario would be relevant when using the "vzctl enter ..." and "vzctl
exec ..." commands.  As something fixed before audit end, this isn't
fully reflected in the audit report (which I now regret, as it would
have helped refresh my memory), except for this indirect note about vzctl:

| 2.2. Testing and review of "strace" logs revealed that only the first 16
| fd's were being closed on VPS entry.  This needs to be corrected.  Also,
| the fd's are being closed _after_ the ioctl call, which is not great,
| although the risk is now mitigated by having the VPS-entering process
| protected from ptrace(2).

My point is that it does make sense to protect a container- or
namespace-entering process from attacks by the container/namespace.
There are attacks this may mitigate.  (Other mitigations are also
needed, though: including e.g. closing the fd's, as mentioned above, and
allocating a new pty, if applicable.  vzctl does these things.  And the
specific "first 16 fd's" issue was corrected at the time, as well.)

Then, one of the recommendations for hardening OpenVZ security that I
listed in the report was:

| 3. The most reliable way to deal with attacks based on matching UIDs is
| to simply not have those, but rather translate full 32-bit unique
| UIDs/GIDs to VPS-specific ones on kernel interfaces.  This has been
| briefly discussed on the mailing list.  The biggest disadvantage that
| was mentioned is that it would make it harder to migrate VPSes across
| nodes.
| It was suggested that matching UIDs/GIDs could continue to be used in
| different VPSes, but they would be different from those the host system
| would use.  In order to ensure cross-VPS security even if an attacker
| would manage to escape from a VPS' chroot jail, permissions on
| /vz/private would need to be set to 700 (with host root as the owner),
| which is being recommended above for other reasons anyway.  Additionally,
| the meaning of certain capabilities (CAP_DAC_OVERRIDE, etc.) would need
| to be "virtualized" when in VE context.  That is, the DAC override would
| apply only to files whose owner UIDs fall within the VPS' range, etc.
| Unfortunately, with all these considerations, this does appear to be not
| so trivial to implement.  So this is more of an idea for further
| discussion rather than a final recommendation.

As far as I'm aware, this was never implemented.

Fast forward to 2015, OpenVZ is experimenting with user namespaces:

| 15 Oct 2015
| "Call for testing: Start CT in a new user namespace: 1:1 user mapping"

| Now CT starts in a new user namespace. This allows us
| * to remove our capabilities (CAP_VE_*)
| * to improve security of our containers, because a process doesn't have privileges outside the container

| Testing
| * need to execute tests to check security of containers
| * execute all tests, because these changes are touching very general parts

I think this applies to experimental RHEL7-based OpenVZ kernels, rather
than to the RHEL6-based (let alone RHEL5-based) OpenVZ kernels that most
people (who use OpenVZ at all) use now.

I found no time to look into this yet, but I recognize now might be the
right time to review this and make it right.

A concern is that old, previously-fixed issues like ptrace'ing a
container-entering process might be opened up.  Another concern is that
unprivileged user namespaces are a (well-known by now?) risk on their
own.  Thus, there needs to be a way to configure a RHEL7/OpenVZ kernel
such that it can use user namespaces to enhance container security, but
users (including host system non-root users) can't use and abuse those -
and this must be the default.

Other than that, it looks like this is a way and an opportunity to
address this long-standing OpenVZ hardening recommendation of mine.

I am not yet looking at Linux containers beyond OpenVZ, but as they
mature I think they'll need to consider the same issues, and my 2005
audit report might still be a relevant checklist.

I'd appreciate comments from people who are more up-to-date on this.



Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.