Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 11 Nov 2017 08:13:33 -0800
From: Kees Cook <keescook@...omium.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Patrick McLean <chutzpah@...too.org>, Emese Revfy <re.emese@...il.com>, 
	Al Viro <viro@...iv.linux.org.uk>, Bruce Fields <bfields@...hat.com>, 
	"Darrick J. Wong" <darrick.wong@...cle.com>, 
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, 
	Linux NFS Mailing List <linux-nfs@...r.kernel.org>, stable <stable@...r.kernel.org>, 
	Thorsten Leemhuis <regressions@...mhuis.info>, 
	"kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>
Subject: Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

On Fri, Nov 10, 2017 at 6:36 PM, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
> [ Bringing in the gcc plugin people and the kernel hardening list,
> since it now is no longer even remotely looking like a nfsd, vfs or
> filesystem issue any more ]
>
> Kees, Emese,
>  the whole thread is on lkml, but there's clearly something horribly
> wrong with RANDSTRUCT, and it's not new even though it looked that way
> for a while.

It wouldn't be the first issue we've seen; it's (obviously) a pretty
aggressive change to the resulting build.

> Patrick seems to trigger it with nfsd, so it might be specific to that.
>
> Alternatively, it might just be that very few people run
> RANDSTRUCT-built kernels, or just have been lucky with the seeding.

Given its potential cache-line abuse, I'm not surprised that its usage
is more limited than other features.

> Sorry for top-posting, but there's not really anything in the email
> itself to reply to, other than saying thanks to Patrick for narrowing
> it down like this.

Agreed; thanks Patrick! :) Given that the issue is non-deterministic,
I wonder if the bug is related to some kind of missing RCU or barrier
that goes unnoticed in normal struct layouts.

> It would have been very interesting if it had actually bisected to
> something, but it seems that the real issue is just the choice of
> seeding for RANDSTRUCT.

That's where we've seen bugs in the past: some pathological ordering
of a struct uncovers a corner case. In the past it's been much more
deterministic: doesn't build, or immediately crashes on boot, etc.

I'll take a closer look at this and see if I can provide something to
narrow it down.

-Kees

>
>                  Linus
>
> On Fri, Nov 10, 2017 at 4:27 PM, Patrick McLean <chutzpah@...too.org> wrote:
>> On 2017-11-10 03:26 PM, Patrick McLean wrote:
>>> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>>>>
>>>> I really don't see anything that looks even half-way suspicious in
>>>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>>>> _really_ subtle.
>>>>
>>>> And hey, it can be a real kernel bug too, that just happens to be
>>>> exposed by RANDSTRUCT, so a bisect really would be very nice.
>>>
>>> I am working on bisecting the issue now, but I think I have some more
>>> evidence pointing to a compiler issue related to RANDSTRUCT. There are
>>> actually 3 issues that we have seen. Sometimes we get the null pointer
>>> deref in the initial message, sometimes we get the GPF, and sometimes we
>>> see an issue where the NFS clients see all files as root-owned
>>> directories. Any given kernel will always see the same issue, but after
>>> a "make mrproper" and recompile (with the same .config), the issue will
>>> often change. I suspect that all 3 of these problems are actually the
>>> same issue manifesting itself in different ways depending on what seed
>>> the RANDSTRUCT gcc plugin is using.
>>
>> Further update on this, using the same seed for RANDSTRUCT, I have
>> reproduced this issue on v4.13.0, so it does not seem to be recently
>> introduced. The older kernel apparently only worked for us because we
>> were lucky. Generally we always compile new kernels from a fresh tree,
>> so they are never using the same seed.
>>
>> In case someone wants to play with this, here are some interesting seeds
>> (in include/generated/randomize_layout_hash.h):
>>
>> Produce a NULL pointer dereference (though I am not sure what the client
>> does to produce this).
>>     5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc
>>
>> All files for nfsd4 clients appear as directories owned as root, no
>> matter the real owner (this happens for all clients we have tested):
>>     3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e
>>
>> This is the seed that was breaking motherboards (make sure you have a
>> way to flash the BIOS with this one):
>>     3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd
>>
>> Finally, here is a seed that produces a kernel that does not exhibit any
>> problems we are aware of:
>>     e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b
>>
>>>>
>>>> Because in the end, compiler bugs are very rare. They are particularly
>>>> annoying when they do happen, though, so they loom big in the mind of
>>>> people who have had to chase them down.
>>>>



-- 
Kees Cook
Pixel Security

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.