kernel-hardening - Re: Kernel complexity

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <X9cHepkKe0oSXtUI@kroah.com>
Date: Mon, 14 Dec 2020 07:34:34 +0100
From: Greg KH <gregkh@...uxfoundation.org>
To: stefan.bavendiek@...lbox.org
Cc: Jann Horn <jannh@...gle.com>,
	Kernel Hardening <kernel-hardening@...ts.openwall.com>,
	linux-hardening@...r.kernel.org
Subject: Re: Kernel complexity

On Sun, Dec 13, 2020 at 08:04:42PM +0100, stefan.bavendiek@...lbox.org wrote:
> > I'm not sure whether this would really be all that helpful for
> > userspace sandboxing decisions - as far as I know, userspace normally
> > isn't in a position where it can really choose which syscalls it wants
> > to use, but instead the choice of syscalls to use is driven by the
> > requirements that userspace has. If you tell userspace that write()
> > can hit tons of kernel code, it's not like userspace can just stop
> > using write(); and if you then also tell userspace that pwrite() can
> > also hit a lot of kernel code, that may be misinterpreted as meaning
> > that pwrite() adds lots of risk while actually, write() and pwrite()
> > reach (almost) the same areas of code. Also, the areas of code that a
> > syscall like write() can hit depend hugely on file system access
> > policies.
> 
> Some issues I have come across revolve around how much attention the
> avoidance of certain system calls should get based on the risk.
> Many applications e.g. like "file" include a seccomp filter that
> restricts most systemcalls from ever being used, without using a broker
> architecture. This is feasible for small applications that do not always
> need to do dangerous things like execve or open (for write). 
> This decision is however often made without extensive research on what
> systemcalls provide dangerous functionality. The idea was to change that
> by providing a risk score for systemcalls.

Like Jann said, syscalls is generally _not_ at the correct level to do
something like this.

Consider a single 'read' syscall of 1 byte out of a file.  Should be
pretty trivial, as that read could be on a sysfs file that merely
returns a single value that is stored in kernel memory for a
configuration option.  That's a simple thing, so all is good, right?

But what about sysfs files that change kernel state when you read a
value, depending on the file, that sometimes is the case, right?

Then think about if you read 1 byte on a filesystem, that is a NFS
mounted filesystem over a PPP networking connection on that is connected
on a USB-serial device to the system.  The number of layers involved
here are very very non-trivial, but yet, that was the same single byte
being read in a syscall.

There's loads of "state" in a kernel system for the configuration of the
system and hardware (oh yeah, you need to think about what the hardware
state is, what hardware involved is and the like.

Good luck!

greg k-h

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.