Date: Mon, 8 May 2017 06:09:28 -0700 From: Kees Cook <keescook@...omium.org> To: Ingo Molnar <mingo@...nel.org> Cc: Thomas Garnier <thgarnie@...gle.com>, Martin Schwidefsky <schwidefsky@...ibm.com>, Heiko Carstens <heiko.carstens@...ibm.com>, Dave Hansen <dave.hansen@...el.com>, Arnd Bergmann <arnd@...db.de>, Thomas Gleixner <tglx@...utronix.de>, David Howells <dhowells@...hat.com>, René Nyffenegger <mail@...enyffenegger.ch>, Andrew Morton <akpm@...ux-foundation.org>, "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>, "Eric W . Biederman" <ebiederm@...ssion.com>, Oleg Nesterov <oleg@...hat.com>, Pavel Tikhomirov <ptikhomirov@...tuozzo.com>, Ingo Molnar <mingo@...hat.com>, "H . Peter Anvin" <hpa@...or.com>, Andy Lutomirski <luto@...nel.org>, Paolo Bonzini <pbonzini@...hat.com>, Rik van Riel <riel@...hat.com>, Josh Poimboeuf <jpoimboe@...hat.com>, Borislav Petkov <bp@...en8.de>, Brian Gerst <brgerst@...il.com>, "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>, Christian Borntraeger <borntraeger@...ibm.com>, Russell King <linux@...linux.org.uk>, Will Deacon <will.deacon@....com>, Catalin Marinas <catalin.marinas@....com>, Mark Rutland <mark.rutland@....com>, James Morse <james.morse@....com>, linux-s390 <linux-s390@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, Linux API <linux-api@...r.kernel.org>, "the arch/x86 maintainers" <x86@...nel.org>, "linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, Kernel Hardening <kernel-hardening@...ts.openwall.com>, Linus Torvalds <torvalds@...ux-foundation.org>, Peter Zijlstra <a.p.zijlstra@...llo.nl> Subject: Re: [PATCH v9 1/4] syscalls: Verify address limit before returning to user-mode On Mon, May 8, 2017 at 12:33 AM, Ingo Molnar <mingo@...nel.org> wrote: > > (added more Cc:s) > > * Thomas Garnier <thgarnie@...gle.com> wrote: > >> On Fri, Apr 28, 2017 at 8:32 AM, Thomas Garnier <thgarnie@...gle.com> wrote: >> > Ensure that a syscall does not return to user-mode with a kernel address >> > limit. If that happens, a process can corrupt kernel-mode memory and >> > elevate privileges . >> > >> > The CONFIG_ADDR_LIMIT_CHECK option disables the generic check so each >> > architecture can create optimized versions. This option is enabled by >> > default on s390 because a similar feature already exists. >> > >> >  https://bugs.chromium.org/p/project-zero/issues/detail?id=990 >> > >> > Signed-off-by: Thomas Garnier <thgarnie@...gle.com> >> > Tested-by: Kees Cook <keescook@...omium.org> >> >> Ingo: Do you want to take the set? > > Yeah, so now I'm questioning the whole premise of the feature, sorry :-/ > > A big disavantage is that the "security check" will add 2-5 instructions to the > system call fast path. Every one of them, and essentially forever. Just to handle > a CVE that was caused by a buggy touch-screen driver helper function leaking > KERNEL_DS and which was fixed long ago ... > > And yes, I realize that there were other such bugs and that such bugs might occur > in the future - but why not push the overhead of the security check to the kernel > build phase? I.e. I'm wondering how well we could do static analysis during kernel > build - would a limited mode of Sparse be good enough for that? Or we could add a > new static checker to tools/, built from first principles and used primarily for > extended syntactical checking. Static analysis is just not going to cover all cases. We've had vulnerabilities where interrupt handlers left KERNEL_DS set, for example. If there are performance concerns, let's put this behind a CONFIG. 2-5 instructions is not an issue for most people that want this coverage. > For example I'd consider it a good practice to mandate that if a kernel function > sets KERNEL_DS then it must restore it as well. Any function that does not do > that, or is too complex for the static analysis to prove correctness for sure > should be considered buggy! > > Are there any common kernel APIs outside set_fs() that set KERNEL_DS > intentionally? The overwhelming pattern ought to be: > > orig_fs = get_fs(); > set_fs(KERNEL_DS); > ... > set_fs(orig_fs); > > ... and even a relatively simple static analysis tool ought to be able to see > through that. This pattern was, in fact, what the interrupt handler bug escaped from. We have to build proactive defenses, and this check has a clear defensive advantage. It's a noble goal to improve the static analyzers and simplify the source, but we have too much history to prove that this just isn't enough. This instruction cost of this is extremely small, too. Until we can eliminate set_fs(), we need to add this check. > I'd even suggest we do it not like Sparse builds are done today, but in a more > integrated fashion: do static analysis as part of a typical kernel defconfig build > and not tolerate warnings but go for a 'zero warnings' policy like Linus uses for > modconfig builds. > > _That_ solution I'd feel very, very good about - it would be so much better than > any runtime checks... I'm not opposed to this, but there will be push-back on "making the build slower", and it still won't catch everything. Bug-finding is different from making a bug class just unexploitable at all. As we've done before, it's the difference between trying to find format string attacks vs just removing %n from the format parser. > Not to mention that such an integrated static analysis facility would allow many > other things to be checked during build time, which we couldn't possibly check > runtime. Absolutely! But it's orthogonal to proactive runtime exploit blocking. We've got one that works and defends against an entire class of vulnerability for very low cost. It it's truly too costly for default, let's put it behind a CONFIG and see who wants it. (Most distros, I suspect, will enable it, just like hardened usercopy which is much more expensive than this.) -Kees -- Kees Cook Pixel Security
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.