kernel-hardening - Re: [PATCH] Introduce the pkill_on

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba67ead7-f075-e7ad-3274-d9b2bc4c1f44@linux.com>
Date: Sat, 2 Oct 2021 14:41:34 +0300
From: Alexander Popov <alex.popov@...ux.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
 Petr Mladek <pmladek@...e.com>
Cc: "Paul E. McKenney" <paulmck@...nel.org>, Jonathan Corbet
 <corbet@....net>, Andrew Morton <akpm@...ux-foundation.org>,
 Thomas Gleixner <tglx@...utronix.de>, Peter Zijlstra <peterz@...radead.org>,
 Joerg Roedel <jroedel@...e.de>, Maciej Rozycki <macro@...am.me.uk>,
 Muchun Song <songmuchun@...edance.com>,
 Viresh Kumar <viresh.kumar@...aro.org>, Robin Murphy <robin.murphy@....com>,
 Randy Dunlap <rdunlap@...radead.org>, Lu Baolu <baolu.lu@...ux.intel.com>,
 Kees Cook <keescook@...omium.org>, Luis Chamberlain <mcgrof@...nel.org>,
 Wei Liu <wl@....org>, John Ogness <john.ogness@...utronix.de>,
 Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
 Alexey Kardashevskiy <aik@...abs.ru>,
 Christophe Leroy <christophe.leroy@...roup.eu>, Jann Horn
 <jannh@...gle.com>, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Mark Rutland <mark.rutland@....com>, Andy Lutomirski <luto@...nel.org>,
 Dave Hansen <dave.hansen@...ux.intel.com>,
 Steven Rostedt <rostedt@...dmis.org>, Will Deacon <will.deacon@....com>,
 David S Miller <davem@...emloft.net>, Borislav Petkov <bp@...en8.de>,
 Kernel Hardening <kernel-hardening@...ts.openwall.com>,
 linux-hardening@...r.kernel.org,
 "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
 Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, notify@...nel.org
Subject: Re: [PATCH] Introduce the pkill_on_warn boot parameter

On 01.10.2021 22:59, Linus Torvalds wrote:
> On Thu, Sep 30, 2021 at 2:15 AM Petr Mladek <pmladek@...e.com> wrote:
>>
>> Honestly, I am not sure if panic_on_warn() or the new pkill_on_warn()
>> work as expected. I wonder who uses it in practice and what is
>> the experience.
> 
> Afaik, there are only two valid uses for panic-on-warn:
> 
>  (a) test boxes (particularly VM's) that are literally running things
> like syzbot and want to report any kernel warnings
> 
>  (b) the "interchangeable production machinery" fail-fast kind of situation
> 
> So in that (a) case, it's literally that you consider a warning to be
> a failure case, and just want to stop. Very useful as a way to get
> notified by syzbot that "oh, that assert can actually trigger".
> 
> And the (b) case is more of a "we have 150 million machines, we expect
> about a thousand of them to fail for any random reason any day
> _anyway_ - perhaps simply due to hardware failure, and we'd rather
> take a machine down quickly and then perhaps look at why only much
> later when we have some pattern to the failures".
> 
> You shouldn't expect panic-on-warn to ever be the case for any actual
> production machine that _matters_. If it is, that production
> maintainer only has themselves to blame if they set that flag.
> 
> But yes, the expectation is that warnings are for "this can't happen,
> but if it does, it's not necessarily fatal, I want to know about it so
> that I can think about it".
> 
> So it might be a case that you don't handle, but that isn't
> necessarily _wrong_ to not handle. You are ok returning an error like
> -ENOSYS for that case, for example, but at the same time you are "If
> somebody uses this, we should perhaps react to it".
> 
> In many cases, a "pr_warn()" is much better. But if you are unsure
> just _how_ the situation can happen, and want a call trace and
> information about what process did it, and it really is a "this
> shouldn't ever happen" situation, a WARN_ON() or a WARN_ON_ONCE() is
> certainly not wrong.
> 
> So think of WARN_ON() as basically an assert, but an assert with the
> intention to be able to continue so that the assert can actually be
> reported. BUG_ON() and friends easily result in a machine that is
> dead. That's unacceptable.
> 
> And think of "panic-on-warn" as people who can deal with their own
> problems. It's fundamentally not your issue.  They took that choice,
> it's their problem, and the security arguments are pure BS - because
> WARN_ON() just shouldn't be something you can trigger anyway.

Thanks, Linus.
And what do you think about the proposed pkill_on_warn?

Let me quote the rationale behind it.

Currently, the Linux kernel provides two types of reaction to kernel warnings:
 1. Do nothing (by default),
 2. Call panic() if panic_on_warn is set. That's a very strong reaction,
    so panic_on_warn is usually disabled on production systems.

>From a safety point of view, the Linux kernel misses a middle way of handling
kernel warnings:
 - The kernel should stop the activity that provokes a warning,
 - But the kernel should avoid complete denial of service.

>From a security point of view, kernel warning messages provide a lot of useful
information for attackers. Many GNU/Linux distributions allow unprivileged users
to read the kernel log (for various reasons), so attackers use kernel warning
infoleak in vulnerability exploits. See the examples:
https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html
https://a13xp0p0v.github.io/2020/02/15/CVE-2019-18683.html
https://googleprojectzero.blogspot.com/2018/09/a-cache-invalidation-bug-in-linux.html

Let's introduce the pkill_on_warn parameter.
If this parameter is set, the kernel kills all threads in a process that
provoked a kernel warning. This behavior is reasonable from a safety point of
view described above. It is also useful for kernel security hardening because
the system kills an exploit process that hits a kernel warning.

Linus, how do you see the proper way of handling WARN_ON() in kthreads if
pkill_on_warn is enabled?

Thanks!

Best regards,
Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.