Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 10 Nov 2017 06:57:55 -0800
From: Andy Lutomirski <luto@...capital.net>
To: "Hector Martin 'marcan'" <marcan@...can.st>
Cc: LKML <linux-kernel@...r.kernel.org>, 
	"kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>, X86 ML <x86@...nel.org>
Subject: Re: vDSO maximum stack usage, stack probes, and -fstack-check

On Fri, Nov 10, 2017 at 2:40 AM, Hector Martin 'marcan'
<marcan@...can.st> wrote:
> As far as I know, the vDSO specs (both Documentation/ABI/stable/vdso and
> `man 7 vdso`) make no mention of how much stack the vDSO functions are
> allowed to use. They just say "the usual C ABI", which makes no guarantees.
>
> It turns out that Go has been assuming that those functions use less
> than 104 bytes of stack space, because it calls them directly on its
> tiny stack allocations with no guard pages or other hardware overflow
> protection [1]. On most systems, this is fine.
>
> However, on my system the stars aligned and turned it into a
> nondeterministic crash. I use Gentoo Hardened, which builds its
> toolchain with -fstack-check on by default. It turns out that with the
> combination of GCC 6.4.0, -fstack-protect, linux-4.13.9-gentoo, and
> CONFIG_OPTIMIZE_INLINING=n, gcc decides to *not* inline vread_tsc (it's
> not marked inline, so it's perfectly within its right not to do that,
> though for some reason it does inline when CONFIG_OPTIMIZE_INLINING=y
> even though that nominally gives it greater freedom *not* to inline
> things marked inline). That turns __vdso_clock_gettime and
> __vdso_gettimeofday into non-leaf functions, and GCC then inserts a
> stack probe (full objdump at [2]):
>
> 0000000000000030 <__vdso_clock_gettime>:
>   30:   55                      push   %rbp
>   31:   48 89 e5                mov    %rsp,%rbp
>   34:   48 81 ec 20 10 00 00    sub    $0x1020,%rsp
>   3b:   48 83 0c 24 00          orq    $0x0,(%rsp)
>   40:   48 81 c4 20 10 00 00    add    $0x1020,%rsp

This code is so wrong I don't even no where to start.  Seriously, sub,
orq, add?  How about just orq with an offset?  How about a *load*
instead of a store?

But stepping back even further, an offset > 4096 is just bogus.
That's big enough to skip right over the guard page.

Anyway, my recollection is that GCC's stack check code is busted until
much newer gcc versions.  I suppose we could try to make the kernel
fail to build at all on a broken configuration like this.

--Andy

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.