Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 29 Jun 2017 12:13:17 +0800
From: Li Kun <hw.likun@...wei.com>
To: Kees Cook <keescook@...omium.org>, <linux-kernel@...r.kernel.org>
CC: Christoph Hellwig <hch@...radead.org>,
        Peter Zijlstra
	<peterz@...radead.org>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        "Andrew
 Morton" <akpm@...ux-foundation.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        "PaX Team" <pageexec@...email.hu>, Jann Horn <jannh@...gle.com>,
        Eric Biggers
	<ebiggers3@...il.com>,
        Elena Reshetova <elena.reshetova@...el.com>,
        "Hans
 Liljestrand" <ishkamiel@...il.com>,
        David Windsor <dwindsor@...il.com>,
        "Greg KH" <gregkh@...uxfoundation.org>, Ingo Molnar <mingo@...hat.com>,
        "Alexey Dobriyan" <adobriyan@...il.com>,
        "Serge E. Hallyn"
	<serge@...lyn.com>, <arozansk@...hat.com>,
        Davidlohr Bueso
	<dave@...olabs.net>,
        Manfred Spraul <manfred@...orfullife.com>,
        "axboe@...nel.dk" <axboe@...nel.dk>,
        "James Bottomley"
	<James.Bottomley@...senpartnership.com>,
        "x86@...nel.org" <x86@...nel.org>, Ingo Molnar <mingo@...nel.org>,
        Arnd Bergmann <arnd@...db.de>,
        "David S.
 Miller" <davem@...emloft.net>,
        Rik van Riel <riel@...hat.com>,
        linux-arch
	<linux-arch@...r.kernel.org>,
        "kernel-hardening@...ts.openwall.com"
	<kernel-hardening@...ts.openwall.com>,
        "Wangkai (Morgan, Euler)"
	<morgan.wang@...wei.com>
Subject: Re: [PATCH v5 3/3] x86/refcount: Implement fast
 refcount overflow protection

Hi Kees,


在 2017/5/31 5:39, Kees Cook 写道:
> This protection is a modified version of the x86 PAX_REFCOUNT defense
> from PaX/grsecurity. This speeds up the refcount_t API by duplicating
> the existing atomic_t implementation with a single instruction added to
> detect if the refcount has wrapped past INT_MAX (or below 0) resulting
> in a negative value, where the handler then restores the refcount_t to
> INT_MAX or saturates to INT_MIN / 2. With this overflow protection, the
> use-after-free following a refcount_t wrap is blocked from happening,
> avoiding the vulnerability entirely.
>
> While this defense only perfectly protects the overflow case, as that
> can be detected and stopped before the reference is freed and left to be
> abused by an attacker, it also notices some of the "inc from 0" and "below
> 0" cases. However, these only indicate that a use-after-free has already
> happened. Such notifications are likely avoidable by an attacker that has
> already exploited a use-after-free vulnerability, but it's better to have
> them than allow such conditions to remain universally silent.
>
> On overflow detection (actually "negative value" detection), the refcount
> value is reset to INT_MAX, the offending process is killed, and a report
> and stack trace are generated. This allows the system to attempt to
> keep operating. In the case of a below-zero decrement or other negative
> value results, the refcount is saturated to INT_MIN / 2 to keep it from
> reaching zero again. (For the INT_MAX reset, another option would be to
> choose (INT_MAX - N) with some small N to provide some headroom for
> legitimate users of the reference counter.)
>
> On the matter of races, since the entire range beyond INT_MAX but before 0
> is negative, every inc will trap, leaving no overflow-only race condition.
>
> As for performance, this implementation adds a single "js" instruction to
> the regular execution flow of a copy of the regular atomic_t operations.
> Since this is a forward jump, it is by default the non-predicted path,
> which will be reinforced by dynamic branch prediction. The result is
> this protection having no measurable change in performance over standard
> atomic_t operations. The error path, located in .text.unlikely, saves
> the refcount location and then uses UD0 to fire a refcount exception
> handler, which resets the refcount, reports the error, marks the process
> to be killed, and returns to regular execution. This keeps the changes to
> .text size minimal, avoiding return jumps and open-coded calls to the
> error reporting routine.
>
> Assembly comparison:
>
> atomic_inc
> .text:
> ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
>
> refcount_inc
> .text:
> ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
> ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
> ...
> .text.unlikely:
> ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
> ffffffff816c36d7:       0f ff                   (bad)
>
> Thanks to PaX Team for various suggestions for improvement.
>
> Signed-off-by: Kees Cook <keescook@...omium.org>
> Reviewed-by: Josh Poimboeuf <jpoimboe@...hat.com>
> ---
>   arch/Kconfig                    |  9 +++++
>   arch/x86/Kconfig                |  1 +
>   arch/x86/include/asm/asm.h      |  6 +++
>   arch/x86/include/asm/refcount.h | 87 +++++++++++++++++++++++++++++++++++++++++
>   arch/x86/mm/extable.c           | 40 +++++++++++++++++++
>   include/linux/kernel.h          |  6 +++
>   include/linux/refcount.h        |  4 ++
>   kernel/panic.c                  | 22 +++++++++++
>   8 files changed, 175 insertions(+)
>   create mode 100644 arch/x86/include/asm/refcount.h
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index fba3bf186728..e9445ac0e899 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -867,6 +867,15 @@ config STRICT_MODULE_RWX
>   config ARCH_WANT_RELAX_ORDER
>   	bool
>   
> +config ARCH_HAS_REFCOUNT
> +	bool
> +	help
> +	  An architecture selects this when it has implemented refcount_t
> +	  using primitizes that provide a faster runtime at the expense
> +	  of some full refcount state checks. The refcount overflow condition,
> +	  however, must be retained. Catching overflows is the primary
> +	  security concern for protecting against bugs in reference counts.
> +
>   config REFCOUNT_FULL
>   	bool "Perform full reference count validation at the expense of speed"
>   	help
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cd18994a9555..65525f76b27c 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -54,6 +54,7 @@ config X86
>   	select ARCH_HAS_KCOV			if X86_64
>   	select ARCH_HAS_MMIO_FLUSH
>   	select ARCH_HAS_PMEM_API		if X86_64
> +	select ARCH_HAS_REFCOUNT
>   	select ARCH_HAS_SET_MEMORY
>   	select ARCH_HAS_SG_CHAIN
>   	select ARCH_HAS_STRICT_KERNEL_RWX
> diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
> index 7a9df3beb89b..676ee5807d86 100644
> --- a/arch/x86/include/asm/asm.h
> +++ b/arch/x86/include/asm/asm.h
> @@ -74,6 +74,9 @@
>   # define _ASM_EXTABLE_EX(from, to)				\
>   	_ASM_EXTABLE_HANDLE(from, to, ex_handler_ext)
>   
> +# define _ASM_EXTABLE_REFCOUNT(from, to)			\
> +	_ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount)
> +
>   # define _ASM_NOKPROBE(entry)					\
>   	.pushsection "_kprobe_blacklist","aw" ;			\
>   	_ASM_ALIGN ;						\
> @@ -123,6 +126,9 @@
>   # define _ASM_EXTABLE_EX(from, to)				\
>   	_ASM_EXTABLE_HANDLE(from, to, ex_handler_ext)
>   
> +# define _ASM_EXTABLE_REFCOUNT(from, to)			\
> +	_ASM_EXTABLE_HANDLE(from, to, ex_handler_refcount)
> +
>   /* For C file, we already have NOKPROBE_SYMBOL macro */
>   #endif
>   
> diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> new file mode 100644
> index 000000000000..aaf9bb3abd71
> --- /dev/null
> +++ b/arch/x86/include/asm/refcount.h
> @@ -0,0 +1,87 @@
> +#ifndef __ASM_X86_REFCOUNT_H
> +#define __ASM_X86_REFCOUNT_H
> +/*
> + * x86-specific implementation of refcount_t. Ported from PAX_REFCOUNT
> + * from PaX/grsecurity.
> + */
> +#include <linux/refcount.h>
> +
> +#define _REFCOUNT_EXCEPTION				\
> +	".pushsection .text.unlikely\n"			\
> +	"111:\tlea %[counter], %%" _ASM_CX "\n"		\
> +	"112:\t" ASM_UD0 "\n"				\
> +	".popsection\n"					\
> +	"113:\n"					\
> +	_ASM_EXTABLE_REFCOUNT(112b, 113b)
> +
> +#define REFCOUNT_CHECK					\
> +	"js 111f\n\t"					\
> +	_REFCOUNT_EXCEPTION
> +
> +#define REFCOUNT_ERROR					\
> +	"jmp 111f\n\t"					\
> +	_REFCOUNT_EXCEPTION
> +
> +static __always_inline void refcount_add(unsigned int i, refcount_t *r)
> +{
> +	asm volatile(LOCK_PREFIX "addl %1,%0\n\t"
> +		REFCOUNT_CHECK
> +		: [counter] "+m" (r->refs.counter)
> +		: "ir" (i)
> +		: "cc", "cx");
> +}
> +
> +static __always_inline void refcount_inc(refcount_t *r)
> +{
> +	asm volatile(LOCK_PREFIX "incl %0\n\t"
> +		REFCOUNT_CHECK
> +		: [counter] "+m" (r->refs.counter)
> +		: : "cc", "cx");
> +}
> +
> +static __always_inline void refcount_dec(refcount_t *r)
> +{
> +	asm volatile(LOCK_PREFIX "decl %0\n\t"
> +		REFCOUNT_CHECK
> +		: [counter] "+m" (r->refs.counter)
> +		: : "cc", "cx");
> +}
> +
> +static __always_inline __must_check
> +bool refcount_sub_and_test(unsigned int i, refcount_t *r)
> +{
> +	GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl", REFCOUNT_CHECK,
> +				  r->refs.counter, "er", i, "%0", e);
> +}
> +
> +static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
> +{
> +	GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl", REFCOUNT_CHECK,
> +				 r->refs.counter, "%0", e);
> +}
> +
> +static __always_inline __must_check bool refcount_inc_not_zero(refcount_t *r)
> +{
> +	int c;
> +
> +	c = atomic_read(&(r->refs));
> +	do {
> +		if (unlikely(c <= 0 || c == INT_MAX))
> +			break;
> +	} while (!atomic_try_cmpxchg(&(r->refs), &c, c + 1));
> +
> +	/* Did we try to increment from an undesirable state? */
> +	if (unlikely(c < 0 || c == INT_MAX)) {
> +		/*
> +		 * Since the overflow flag will have been reset, this will
> +		 * always saturate.
> +		 */
> +		asm volatile(REFCOUNT_ERROR
> +			     : : [counter] "m" (r->refs.counter)
> +			     : "cc", "cx");
> +	}
> +
> +	return c != 0;
> +}
> +
> +#endif
> diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
> index 35ea061010a1..33324dfe8799 100644
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -36,6 +36,46 @@ bool ex_handler_fault(const struct exception_table_entry *fixup,
>   }
>   EXPORT_SYMBOL_GPL(ex_handler_fault);
>   
> +/*
> + * Handler for UD0 exception following a negative-value test (via "js"
> + * instruction) against the result of a refcount inc/dec/add/sub.
> + */
> +bool ex_handler_refcount(const struct exception_table_entry *fixup,
> +			 struct pt_regs *regs, int trapnr)
> +{
> +	int reset;
> +
> +	/*
> +	 * If we crossed from INT_MAX to INT_MIN, the OF flag (result
> +	 * wrapped around) and the SF flag (result is negative) will be
> +	 * set. In this case, reset to INT_MAX in an attempt to leave the
> +	 * refcount usable. Otherwise, we've landed here due to producing
> +	 * a negative result from either decrementing zero or operating on
> +	 * a negative value. In this case things are badly broken, so we
> +	 * we saturate to INT_MIN / 2.
> +	 */
> +	if (regs->flags & (X86_EFLAGS_OF | X86_EFLAGS_SF))
> +		reset = INT_MAX;
     Should it be like this to indicate that the refcount is wapped from 
INT_MAX to INT_MIN ?

	if (regs->flags & (X86_EFLAGS_OF | X86_EFLAGS_SF)
		== (X86_EFLAGS_OF | X86_EFLAGS_SF))
		reset = INT_MAX;

> +	else
> +		reset = INT_MIN / 2;
> +	*(int *)regs->cx = reset;
> +
> +	/*
> +	 * Strictly speaking, this reports the fixup destination, not
> +	 * the fault location, and not the actually overflowing
> +	 * instruction, which is the instruction before the "js", but
> +	 * since that instruction could be a variety of lengths, just
> +	 * report the location after the overflow, which should be close
> +	 * enough for finding the overflow, as it's at least back in
> +	 * the function, having returned from .text.unlikely.
> +	 */
> +	regs->ip = ex_fixup_addr(fixup);
> +	refcount_error_report(regs);
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_GPL(ex_handler_refcount);
> +
>   bool ex_handler_ext(const struct exception_table_entry *fixup,
>   		   struct pt_regs *regs, int trapnr)
>   {
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 13bc08aba704..b9a842750e08 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -276,6 +276,12 @@ extern int oops_may_print(void);
>   void do_exit(long error_code) __noreturn;
>   void complete_and_exit(struct completion *, long) __noreturn;
>   
> +#ifdef CONFIG_ARCH_HAS_REFCOUNT
> +void refcount_error_report(struct pt_regs *regs);
> +#else
> +static inline void refcount_error_report(struct pt_regs *regs) { }
> +#endif
> +
>   /* Internal, do not use. */
>   int __must_check _kstrtoul(const char *s, unsigned int base, unsigned long *res);
>   int __must_check _kstrtol(const char *s, unsigned int base, long *res);
> diff --git a/include/linux/refcount.h b/include/linux/refcount.h
> index 68ecb431dbab..8750fbfe8476 100644
> --- a/include/linux/refcount.h
> +++ b/include/linux/refcount.h
> @@ -54,6 +54,9 @@ extern void refcount_sub(unsigned int i, refcount_t *r);
>   extern __must_check bool refcount_dec_and_test(refcount_t *r);
>   extern void refcount_dec(refcount_t *r);
>   #else
> +# ifdef CONFIG_ARCH_HAS_REFCOUNT
> +#  include <asm/refcount.h>
> +# else
>   static inline __must_check bool refcount_add_not_zero(unsigned int i,
>   						      refcount_t *r)
>   {
> @@ -95,6 +98,7 @@ static inline void refcount_dec(refcount_t *r)
>   {
>   	atomic_dec(&r->refs);
>   }
> +# endif /* !CONFIG_ARCH_HAS_REFCOUNT */
>   #endif /* CONFIG_REFCOUNT_FULL */
>   
>   extern __must_check bool refcount_dec_if_one(refcount_t *r);
> diff --git a/kernel/panic.c b/kernel/panic.c
> index a58932b41700..fb8576ce1638 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -26,6 +26,7 @@
>   #include <linux/nmi.h>
>   #include <linux/console.h>
>   #include <linux/bug.h>
> +#include <linux/ratelimit.h>
>   
>   #define PANIC_TIMER_STEP 100
>   #define PANIC_BLINK_SPD 18
> @@ -601,6 +602,27 @@ EXPORT_SYMBOL(__stack_chk_fail);
>   
>   #endif
>   
> +#ifdef CONFIG_ARCH_HAS_REFCOUNT
> +static DEFINE_RATELIMIT_STATE(refcount_ratelimit, 15 * HZ, 3);
> +
> +void refcount_error_report(struct pt_regs *regs)
> +{
> +	/* Always make sure triggering process will be terminated. */
> +	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, current, true);
> +
> +	if (!__ratelimit(&refcount_ratelimit))
> +		return;
> +
> +	pr_emerg("refcount overflow detected in: %s:%d, uid/euid: %u/%u\n",
> +		current->comm, task_pid_nr(current),
> +		from_kuid_munged(&init_user_ns, current_uid()),
> +		from_kuid_munged(&init_user_ns, current_euid()));
> +	print_symbol(KERN_EMERG "refcount error occurred at: %s\n",
> +		instruction_pointer(regs));
> +	show_regs(regs);
> +}
> +#endif
> +
>   core_param(panic, panic_timeout, int, 0644);
>   core_param(pause_on_oops, pause_on_oops, int, 0644);
>   core_param(panic_on_warn, panic_on_warn, int, 0644);

-- 
Best Regards
Li Kun

Powered by blists - more mailing lists

Your e-mail address:

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.