Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 21 Jan 2018 18:23:59 -0800
From: Andy Lutomirski <>
To: Linus Torvalds <>
Cc: Andy Lutomirski <>, Jann Horn <>, 
	Dan Williams <>, Thomas Gleixner <>, 
	linux-arch <>, 
	Kernel Hardening <>, 
	Greg Kroah-Hartman <>, "the arch/x86 maintainers" <>, 
	Ingo Molnar <>, "H. Peter Anvin" <>, Alan Cox <>
Subject: Re: Re: [PATCH v4.1 07/10] x86: narrow out of
 bounds syscalls to sys_read under speculation

On Sun, Jan 21, 2018 at 6:04 PM, Linus Torvalds
<> wrote:
> On Sun, Jan 21, 2018 at 5:38 PM, Andy Lutomirski <> wrote:
>> 3. What's with sbb; and?  I can see two sane ways to do this.  One is
>> cmovaq [something safe], %rax,
> Heh. I think it's partly about being old-fashioned. sbb has always
> been around, and is the traditional trick for 0/-1.
> Also, my original suggested thing did the *access* too, and masked the
> result with the same mask.
> But I guess we could use cmov instead. It has very similar performance
> (ie it was relatively slow on P4, but so was sbb).
> However, I suspect it actually has a slightly higher register
> pressure, since you'd need to have that zero register (zero being the
> "safe" value), and the only good way to get a zero value is the xor
> thing, which affects flags and thus needs to be before the cmp.
> In contrast, the sbb trick has no early inputs needed.
> So on the whole, 'cmov' may be more natural on a conceptual level, but
> the sbb trick really is a very "traditional x86 thing" to do.

Fair enough.

That being said, what I *actually* want to do is to nuke this thing
entirely.  I just wrote a patch to turn off the SYSCALL64 fast path
entirely when retpolines are on.  Then this issue can be dealt with in
C.  I assume someone has a brilliant way to make gcc automatically do
something intelligent about guarded array access in C. </snicker>
Seriously, though, the retpolined fast path is actually slower than
the slow path on a "minimal" retpoline kernel (which is what I'm using
because Fedora hasn't pushed out a retpoline compiler yet), and I
doubt it's more than the tiniest speedup on a full retpoline kernel.

I've read a bunch of emails flying around saying that retpolines
aren't that bad.  In very informal benchmarking, a single mispredicted
ret (which is what a retpoline is) seems to take 20-30 cycles on
Skylake.  That's pretty far from "not bad".  Is IBRS somehow doing
something that adversely affects code that doesn't use indirect
branches?  Because I'm having a bit of a hard time imagining IBRS
hurting indirect branches worse than retpolines do.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.