Date: Sun, 21 Jan 2018 18:23:59 -0800 From: Andy Lutomirski <luto@...nel.org> To: Linus Torvalds <torvalds@...ux-foundation.org> Cc: Andy Lutomirski <luto@...nel.org>, Jann Horn <jannh@...gle.com>, Dan Williams <dan.j.williams@...el.com>, Thomas Gleixner <tglx@...utronix.de>, linux-arch <linux-arch@...r.kernel.org>, Kernel Hardening <kernel-hardening@...ts.openwall.com>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>, "the arch/x86 maintainers" <x86@...nel.org>, Ingo Molnar <mingo@...hat.com>, "H. Peter Anvin" <hpa@...or.com>, Alan Cox <alan@...ux.intel.com> Subject: Re: Re: [PATCH v4.1 07/10] x86: narrow out of bounds syscalls to sys_read under speculation On Sun, Jan 21, 2018 at 6:04 PM, Linus Torvalds <torvalds@...ux-foundation.org> wrote: > On Sun, Jan 21, 2018 at 5:38 PM, Andy Lutomirski <luto@...nel.org> wrote: >> >> 3. What's with sbb; and? I can see two sane ways to do this. One is >> cmovaq [something safe], %rax, > > Heh. I think it's partly about being old-fashioned. sbb has always > been around, and is the traditional trick for 0/-1. > > Also, my original suggested thing did the *access* too, and masked the > result with the same mask. > > But I guess we could use cmov instead. It has very similar performance > (ie it was relatively slow on P4, but so was sbb). > > However, I suspect it actually has a slightly higher register > pressure, since you'd need to have that zero register (zero being the > "safe" value), and the only good way to get a zero value is the xor > thing, which affects flags and thus needs to be before the cmp. > > In contrast, the sbb trick has no early inputs needed. > > So on the whole, 'cmov' may be more natural on a conceptual level, but > the sbb trick really is a very "traditional x86 thing" to do. Fair enough. That being said, what I *actually* want to do is to nuke this thing entirely. I just wrote a patch to turn off the SYSCALL64 fast path entirely when retpolines are on. Then this issue can be dealt with in C. I assume someone has a brilliant way to make gcc automatically do something intelligent about guarded array access in C. </snicker> Seriously, though, the retpolined fast path is actually slower than the slow path on a "minimal" retpoline kernel (which is what I'm using because Fedora hasn't pushed out a retpoline compiler yet), and I doubt it's more than the tiniest speedup on a full retpoline kernel. I've read a bunch of emails flying around saying that retpolines aren't that bad. In very informal benchmarking, a single mispredicted ret (which is what a retpoline is) seems to take 20-30 cycles on Skylake. That's pretty far from "not bad". Is IBRS somehow doing something that adversely affects code that doesn't use indirect branches? Because I'm having a bit of a hard time imagining IBRS hurting indirect branches worse than retpolines do.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.