Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 14 Dec 2018 15:14:54 +0100
From: Jann Horn <>
To: Solar Designer <>
Cc:, Greg KH <>, 
	Yves-Alexis Perez <>, Brad Spengler <>, Jann Horn <>
Subject: Re: Linux kernel: userfaultfd bypasses tmpfs file
 permissions (CVE-2018-18397; since 4.11; fixed in 4.14.87 and 4.19.7)

On Wed, Dec 12, 2018 at 3:24 PM Solar Designer <> wrote:
> On Wed, Dec 12, 2018 at 01:27:13AM +0100, Jann Horn wrote:
> > NOTE: I have requested a CVE identifier, and I'm sending this message,
> > to make tracking of the fix easier; however, to avoid missing security
> > fixes without CVE identifiers, you should *NOT* be cherry-picking a
> > specific patch in response to a notification about a kernel security
> > bug.
> (I resisted the urge to comment on this piece in previous postings.)
> What should distros/users do, then?  Use latest mainline or upstream
> stable kernels?  That would expose them to the many recent bugs like
> this one, but which haven't yet been found (or not yet made public,
> which is worse).
> As far as I can tell, by far most Linux kernel vulnerabilities (that are
> eventually found and made public) are in relatively recent (as of that
> time) kernel versions.  So a user or a distro would avoid most
> vulnerabilities (that are eventually found and made public) by staying
> sufficiently behind current versions, and relying on backports, even if
> at risk of missing untracked vulnerabilities.  Currently this can be
> achieved e.g. by using RHEL7'ish kernels forked by Red Hat off 3.10, but
> probably not anything newer than that yet.  (And when RHEL7 was just
> released, its kernels were not quite ready for such use.  It takes
> even RHEL kernels a few years and a few hundred revisions to mature and
> become a lower security risk.  Fortunately, there's a previous RHEL at a
> few years and a few hundred revisions old yet still maintained during
> that time.)

I think one additional aspect here is the kernel config. From what
I've seen, distros tend to turn on all the config options because they
probably have some user, somewhere, who wants to use that feature; and
if you use that strategy for your kernel config, then yes, new
releases probably add new features and attack surface.

But since you're able to use a 3.10 kernel, evidently you don't need
those features. So I think it makes sense to, instead of comparing a
3.10 distro kernel and a 4.19 distro kernel, look at an old and a new
kernel with the same feature set enabled.

Looking at the public Linux kernel bugs I filed in our bugtracker
(which, of course, are a very small number of bugs and probably not
very representative):
"Linux: perf_event_open() can race with execve()"
probably exploitable since 3.7, since that's when
"Linux: UAF via double-fdput() in bpf(BPF_PROG_LOAD) error path"
exploitable since 4.4
depends on CONFIG_BPF_SYSCALL, which only exists since 3.18
"Linux: reference count overflow using BPF maps"
exploitable since 4.4
depends on CONFIG_BPF_SYSCALL, which only exists since 3.18
"Linux: arbitrary memory read on arm/arm64 via perf_event_open()"
My PoC was written against 3.10, so being on 3.10 doesn't help here.
"Linux: Stack overflow via ecryptfs and /proc/$pid/environ"
I think this probably also worked on v3.10, haven't tested though.
Newer kernels mitigate this bug class (kernel stack overflow) on
x86-64 and ARM64, turning it into a clean kernel crash.
"Linux: SELinux W+X protection bypass via AIO"
I think this is an old bug?
"Linux: eBPF verifier log leaks lower half of map pointer"
depends on CONFIG_BPF_SYSCALL, which only exists since 3.18
"Linux: mincore() discloses uninitialized kernel heap pages"
introduced in 4.0
(depends on CONFIG_HUGETLB_PAGE, but I guess probably almost everyone
has that on)
"arbitrary read+write via incorrect range tracking in eBPF"
introduced in 4.14
depends on CONFIG_BPF_SYSCALL, which only exists since 3.18
"eBPF verifier bug backported to 4.9-stable"
introduced in 4.12
depends on CONFIG_BPF_SYSCALL, which only exists since 3.18
"Linux RNG flaws"
introduced in 4.8
"Linux: 4-byte infoleak via uninitialized struct field in compat
adjtimex syscall"
introduced in 4.13
depends on COMPAT
"Linux ext4: out-of-bounds memcpy via non-inline xattr"
introduced in 4.13
"Linux/Ubuntu: other users' coredumps can be read via setgid directory
and killpriv bypass"
I think this one's a really old bug.
"Linux: reiserfs: heap overflow in listxattr_filler()"
introduced in 2.6.30
"Linux: percpu refcounts on struct mount are racy"
introduced in 3.13
"Linux: insufficient shootdown for paging-structure caches"
introduced in 4.14
"Linux: arbitrary kernel read into dmesg via missing address check in
segfault handler"
introduced in 4.18
"Linux: kernel ptr leak via BPF: broken subtraction check"
introduced in 4.15
depends on CONFIG_BPF_SYSCALL, which only exists since 3.18
"Linux: semi-arbitrary task stack read on ARM64 (and x86) via /proc/$pid/stack"
introduced in 2.6.29, I think
(depends on CONFIG_STACKTRACE, but that's probably on)
heightened impact on kernels before 4.4 if you don't have a backport
of "fork: unconditionally clear stack on fork"
"Linux: VMA use-after-free via buggy vmacache_flush_all() fastpath"
introduced in 3.15
"Linux: bpf verifier: 32-bit RSH verification doesn't truncate input
before the ALU op"
introduced in 4.15
depends on CONFIG_BPF_SYSCALL, which only exists since 3.18
"Linux: mremap() TLB flush too late with concurrent ftruncate()"
introduced in 3.2
heightened impact on kernels before 4.9
"Linux: userfaultfd bypasses tmpfs file permissions"
introduced in 4.16
depends on CONFIG_USERFAULTFD, which only exists since 4.3
"Linux: broken uid/gid mapping for nested user namespaces with >5 ranges"
introduced in 4.15

So by my count, that's roughly:

A) 5 bugs that were already in 3.10 (reiserfs, coredump leak, W+X
bypass, ARM64 perf_event_open(), perf_event_open()/execve() race)
B) 3 additional bugs that were already in 3.10, and where the bug was
worse in old kernels than in the affected one (UAF via late TLB flush;
infoleak from the stack), or where modern kernels would mitigate the
issue (stack overflow)
C) 8 bugs that are gated behind config flags that you won't have set
if you haven't enabled new features after 3.10 (BPF and userfaultfd)
D) 9 bugs that are newer than 3.10 and that might be compiled in even
if you haven't enabled new features since 3.10 (user namespaces, VMA
UAF, kernel read into dmesg, TLB race, percpu refcounts, ext4, compat
adjtimex, RNG issues, mincore heap leak)

(But again, this isn't exactly a large sample set.)

> A question to ask may be: out of Linux kernel vulnerabilities being
> patched, are there more high and critical overall severity (e.g., as
> risk impact times risk probability) vulnerabilities found in "too
> recent" kernels than there are high and critical severity untracked
> vulnerabilities (also or instead) affecting "sufficiently old" kernels?
> My gut feeling is there are many more such vulnerabilities in "too
> recent" kernels than there are those untracked vulnerabilities in
> "sufficiently old" kernels.  (BTW, a vulnerability being untracked
> likely correlates with it being a lower risk probability at least for
> non-targeted attacks.)  Hence optimal strategy for a distro and their
> users is to stay with "sufficiently old" base versions and backport
> whatever is known to be worthy of a backport.
> There are no maintained upstream stable branches started long enough ago
> for them to be as mature as e.g. RHEL7 kernels are now.  Besides,
> upstream stable branches also suffer from lack of backports of fixes for
> untracked vulnerabilities.
> The recommendation to use latest mainline or upstream stable kernels is
> safe to give (and in a way even the most responsible one to give), but
> not necessarily the best to follow.
> I do not have a suggestion on what to do about that as it relates to
> recommendations/disclaimers on postings such as Jann's.  Ideally, we
> wouldn't have so many new security vulnerabilities being introduced to
> new Linux kernels all the time, but that seems unrealistic given the
> pace of Linux kernel development and growth.

I think it might be helpful to ensure that kernels used in
environments where you care about security are not configured with the
maximum amount of features possible, but instead adjusted to actual
requirements via kernel config and sysctls. Examples:

Regarding the specific bug that started this thread: userfaultfd is
enabled by distro kernels, but the only current usecase I'm aware of
is reduction of downtime for QEMU live migration. You probably don't
need it.
You might not need compat support.
You probably don't need support for every single filesystem Linux knows about.
eBPF is useful for some networking and performance tracing stuff, but
you probably don't actually need it to be available for non-root, even
if you do have a use for it.

This should let you avoid many bugs that are introduced as part of new
features; but of course, it doesn't do much against bugs introduced by
performance optimizations and such.

It sucks that distros shipping binary kernels kinda have to do the
opposite of this in order to fulfill their users' needs, at least for
config options where "build as a module" isn't an option. :( If
distros want to use a single kernel image for everything, perhaps
having more sysctls to lock down new features, in addition to the
kernel config, would help...

> > In Linux kernel versions since 4.11, userfaultfd can be used to write
> > arbitrary data into holes in sparse tmpfs files to which an attacker
> > has read-only access.
> >
> > This is CVE-2018-18397.
> >
> >
> >
> >
> >
> Interesting.  How did you find this?

I was specifically looking through the userfaultfd code for security
bugs. I think this was the first time I looked at the userfaultfd code
this way (instead of just looking for ).

> Alexander
> P.S. I guess Jann's message did not reach subscribers who are on Gmail
> and such because of's DMARC policy.  So I made sure to quote
> all of it above.

Bleeh... I guess maybe I should use a account for that...

Powered by blists - more mailing lists

Your e-mail address:

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.