Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 5 Jul 2018 10:12:37 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Alexander Popov <alex.popov@...ux.com>
Cc: kernel-hardening@...ts.openwall.com, Kees Cook <keescook@...omium.org>,
	PaX Team <pageexec@...email.hu>,
	Brad Spengler <spender@...ecurity.net>,
	Andy Lutomirski <luto@...nel.org>, Tycho Andersen <tycho@...ho.ws>,
	Laura Abbott <labbott@...hat.com>,
	Mark Rutland <mark.rutland@....com>,
	Ard Biesheuvel <ard.biesheuvel@...aro.org>,
	Borislav Petkov <bp@...en8.de>,
	Richard Sandiford <richard.sandiford@....com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H . Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	"Dmitry V . Levin" <ldv@...linux.org>,
	Emese Revfy <re.emese@...il.com>, Jonathan Corbet <corbet@....net>,
	Andrey Ryabinin <aryabinin@...tuozzo.com>,
	"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
	Thomas Garnier <thgarnie@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Alexei Starovoitov <ast@...nel.org>, Josef Bacik <jbacik@...com>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Nicholas Piggin <npiggin@...il.com>,
	Al Viro <viro@...iv.linux.org.uk>,
	"David S . Miller" <davem@...emloft.net>,
	Ding Tianhong <dingtianhong@...wei.com>,
	David Woodhouse <dwmw@...zon.co.uk>,
	Josh Poimboeuf <jpoimboe@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Dominik Brodowski <linux@...inikbrodowski.net>,
	Juergen Gross <jgross@...e.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Dan Williams <dan.j.williams@...el.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Mathias Krause <minipli@...glemail.com>,
	Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
	Kyle Huey <me@...ehuey.com>,
	Dmitry Safonov <dsafonov@...tuozzo.com>,
	Will Deacon <will.deacon@....com>, Arnd Bergmann <arnd@...db.de>,
	Florian Weimer <fweimer@...hat.com>,
	Boris Lukashev <blukashev@...pervictus.com>,
	Andrey Konovalov <andreyknvl@...gle.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v13 (resend) 2/6] x86/entry: Add STACKLEAK erasing the
 kernel stack at the end of syscalls


* Alexander Popov <alex.popov@...ux.com> wrote:

> The STACKLEAK feature erases the kernel stack before returning from
> syscalls. That reduces the information which kernel stack leak bugs can
> reveal and blocks some uninitialized stack variable attacks. Moreover,
> STACKLEAK blocks kernel stack depth overflow caused by alloca (aka
> Stack Clash attack).
> 
> This commit introduces the code filling the used part of the kernel
> stack with a poison value before returning to the userspace. Full
> STACKLEAK feature also contains the gcc plugin which comes in a
> separate commit.
> 
> The STACKLEAK feature is ported from grsecurity/PaX. More information at:
>   https://grsecurity.net/
>   https://pax.grsecurity.net/
> 
> This code is modified from Brad Spengler/PaX Team's code in the last
> public patch of grsecurity/PaX based on our understanding of the code.
> Changes or omissions from the original code are ours and don't reflect
> the original grsecurity/PaX code.
> 
> Signed-off-by: Alexander Popov <alex.popov@...ux.com>
> Acked-by: Thomas Gleixner <tglx@...utronix.de>
> Reviewed-by: Dave Hansen <dave.hansen@...ux.intel.com>
> ---
>  Documentation/x86/x86_64/mm.txt  |  2 ++
>  arch/Kconfig                     | 27 +++++++++++++++++
>  arch/x86/Kconfig                 |  1 +
>  arch/x86/entry/calling.h         | 14 +++++++++
>  arch/x86/entry/entry_32.S        |  7 +++++
>  arch/x86/entry/entry_64.S        |  3 ++
>  arch/x86/entry/entry_64_compat.S |  5 ++++
>  include/linux/sched.h            |  4 +++
>  include/linux/stackleak.h        | 24 +++++++++++++++
>  kernel/Makefile                  |  4 +++
>  kernel/fork.c                    |  3 ++
>  kernel/stackleak.c               | 63 ++++++++++++++++++++++++++++++++++++++++
>  12 files changed, 157 insertions(+)
>  create mode 100644 include/linux/stackleak.h
>  create mode 100644 kernel/stackleak.c
> 
> diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
> index 5432a96..600bc2a 100644
> --- a/Documentation/x86/x86_64/mm.txt
> +++ b/Documentation/x86/x86_64/mm.txt
> @@ -24,6 +24,7 @@ ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
>  [fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
>  ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
>  ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
> +STACKLEAK_POISON value in this last hole: ffffffffffff4111
>  
>  Virtual memory map with 5 level page tables:
>  
> @@ -50,6 +51,7 @@ ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space
>  [fixmap start]   - ffffffffff5fffff kernel-internal fixmap range
>  ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
>  ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
> +STACKLEAK_POISON value in this last hole: ffffffffffff4111
>  
>  Architecture defines a 64-bit virtual address. Implementations can support
>  less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 1aa5906..57817f0 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -414,6 +414,13 @@ config PLUGIN_HOSTCC
>  	  Host compiler used to build GCC plugins.  This can be $(HOSTCXX),
>  	  $(HOSTCC), or a null string if GCC plugin is unsupported.
>  
> +config HAVE_ARCH_STACKLEAK
> +	bool
> +	help
> +	  An architecture should select this if it has the code which
> +	  fills the used part of the kernel stack with the STACKLEAK_POISON
> +	  value before returning from system calls.
> +
>  config HAVE_GCC_PLUGINS
>  	bool
>  	help
> @@ -549,6 +556,26 @@ config GCC_PLUGIN_RANDSTRUCT_PERFORMANCE
>  	  in structures.  This reduces the performance hit of RANDSTRUCT
>  	  at the cost of weakened randomization.
>  
> +config GCC_PLUGIN_STACKLEAK
> +	bool "Erase the kernel stack before returning from syscalls"
> +	depends on GCC_PLUGINS
> +	depends on HAVE_ARCH_STACKLEAK
> +	help
> +	  This option makes the kernel erase the kernel stack before
> +	  returning from system calls. That reduces the information which
> +	  kernel stack leak bugs can reveal and blocks some uninitialized
> +	  stack variable attacks. This option also blocks kernel stack depth
> +	  overflow caused by alloca (aka Stack Clash attack).

Nit, please pick one of these variants to refer to library functions:

	  overflow caused by 'alloca' (aka Stack Clash attack).
	  overflow caused by alloca() (aka Stack Clash attack).

Like you correctly did later on in a C comment block:

> + * STACKLEAK blocks stack depth overflow caused by alloca() (aka Stack Clash
> + * attack).
> + */


> +	  The tradeoff is the performance impact: on a single CPU system kernel
> +	  compilation sees a 1% slowdown, other systems and workloads may vary
> +	  and you are advised to test this feature on your expected workload
> +	  before deploying it.

Is there a way to patch this out runtime? I.e. if a distro enabled it, is there an 
easy way to disable much of the overhead without rebooting the kernel?

> --- /dev/null
> +++ b/include/linux/stackleak.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_STACKLEAK_H
> +#define _LINUX_STACKLEAK_H
> +
> +#include <linux/sched.h>
> +#include <linux/sched/task_stack.h>
> +
> +/*
> + * Check that the poison value points to the unused hole in the
> + * virtual memory map for your platform.
> + */
> +#define STACKLEAK_POISON -0xBEEF
> +
> +#define STACKLEAK_POISON_CHECK_DEPTH 128
> +
> +static inline void stackleak_task_init(struct task_struct *task)
> +{
> +#ifdef CONFIG_GCC_PLUGIN_STACKLEAK
> +	task->lowest_stack = (unsigned long)end_of_stack(task) +
> +						sizeof(unsigned long);
> +#endif

Please don't break the line in such an ugly fashion - just keep the line long and 
ignore checkpatch, because the cure is worse than the disease.

> --- /dev/null
> +++ b/kernel/stackleak.c
> @@ -0,0 +1,63 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This code fills the used part of the kernel stack with a poison value
> + * before returning to the userspace. It's a part of the STACKLEAK feature
> + * ported from grsecurity/PaX.

s/returning to the userspace
 /returning to userspace

s/It's a part of the STACKLEAK feature
 /It's part of the STACKLEAK feature

> + * Author: Alexander Popov <alex.popov@...ux.com>
> + *
> + * STACKLEAK reduces the information which kernel stack leak bugs can
> + * reveal and blocks some uninitialized stack variable attacks. Moreover,
> + * STACKLEAK blocks stack depth overflow caused by alloca() (aka Stack Clash
> + * attack).
> + */
> +
> +#include <linux/stackleak.h>
> +
> +asmlinkage void stackleak_erase_kstack(void)

s/stackleak_erase_kstack
 /stackleak_erase

?

> +{
> +	/* It would be nice not to have 'kstack_ptr' and 'boundary' on stack */
> +	unsigned long kstack_ptr = current->lowest_stack;
> +	unsigned long boundary = kstack_ptr & ~(THREAD_SIZE - 1);
> +	unsigned int poison_count = 0;
> +	const unsigned int check_depth =
> +			STACKLEAK_POISON_CHECK_DEPTH / sizeof(unsigned long);

ugly linebreak.

> +
> +	/* Search for the poison value in the kernel stack */
> +	while (kstack_ptr > boundary && poison_count <= check_depth) {
> +		if (*(unsigned long *)kstack_ptr == STACKLEAK_POISON)
> +			poison_count++;
> +		else
> +			poison_count = 0;
> +
> +		kstack_ptr -= sizeof(unsigned long);
> +	}
> +
> +	/*
> +	 * One 'long int' at the bottom of the thread stack is reserved and
> +	 * should not be poisoned (see CONFIG_SCHED_STACK_END_CHECK).

Nit:

  s/CONFIG_SCHED_STACK_END_CHECK
   /CONFIG_SCHED_STACK_END_CHECK=y

> +	 */
> +	if (kstack_ptr == boundary)
> +		kstack_ptr += sizeof(unsigned long);
> +
> +	/*
> +	 * Now write the poison value to the kernel stack. Start from
> +	 * 'kstack_ptr' and move up till the new 'boundary'. We assume that
> +	 * the stack pointer doesn't change when we write poison.
> +	 */
> +	if (on_thread_stack())
> +		boundary = current_stack_pointer;
> +	else
> +		boundary = current_top_of_stack();
> +
> +	BUG_ON(boundary - kstack_ptr >= THREAD_SIZE);

Should never happen, right? If so then please make this:

	if (WARN_ON(boundary - kstack_ptr >= THREAD_SIZE))
		return;

or so, to make it non-fatal and to allow users to report it, should it trigger 
against all expectations.

> +
> +	while (kstack_ptr < boundary) {
> +		*(unsigned long *)kstack_ptr = STACKLEAK_POISON;
> +		kstack_ptr += sizeof(unsigned long);
> +	}
> +
> +	/* Reset the 'lowest_stack' value for the next syscall */
> +	current->lowest_stack = current_top_of_stack() - THREAD_SIZE / 64;
> +}

Nit, I'd write this as:

	current->lowest_stack = current_top_of_stack() - THREAD_SIZE/64;

to make it group better visually. (Again, ignore checkpatch if it complains, it's 
wrong.)

Overall I like the interface cleanups in v13, so if these nits are addressed and 
it becomes possible for (root users) to disable the checking then I suppose this 
is fine.

Thanks,

	Ingo

Powered by blists - more mailing lists

Your e-mail address:

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.