kernel-hardening - Re: [PATCH] signal: Extend exec

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiOS4Fi2tsXQrvLOiW69g4HiJYsqL6RPeTd14b4+2-Ykg@mail.gmail.com>
Date: Thu, 2 Apr 2020 11:06:02 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Jann Horn <jannh@...gle.com>, Alan Stern <stern@...land.harvard.edu>, 
	Andrea Parri <parri.andrea@...il.com>, Will Deacon <will@...nel.org>, 
	Peter Zijlstra <peterz@...radead.org>, Boqun Feng <boqun.feng@...il.com>, 
	Nicholas Piggin <npiggin@...il.com>, David Howells <dhowells@...hat.com>, 
	Jade Alglave <j.alglave@....ac.uk>, Luc Maranget <luc.maranget@...ia.fr>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Akira Yokosawa <akiyks@...il.com>, 
	Daniel Lustig <dlustig@...dia.com>, Adam Zabrocki <pi3@....com.pl>, 
	kernel list <linux-kernel@...r.kernel.org>, 
	Kernel Hardening <kernel-hardening@...ts.openwall.com>, Oleg Nesterov <oleg@...hat.com>, 
	Andy Lutomirski <luto@...capital.net>, Bernd Edlinger <bernd.edlinger@...mail.de>, 
	Kees Cook <keescook@...omium.org>, Andrew Morton <akpm@...ux-foundation.org>, 
	stable <stable@...r.kernel.org>, Marco Elver <elver@...gle.com>, 
	Dmitry Vyukov <dvyukov@...gle.com>, kasan-dev <kasan-dev@...glegroups.com>
Subject: Re: [PATCH] signal: Extend exec_id to 64bits

On Thu, Apr 2, 2020 at 6:14 AM Eric W. Biederman <ebiederm@...ssion.com> wrote:
>
> Linus Torvalds <torvalds@...ux-foundation.org> writes:
>
> > tasklist_lock is aboue the hottest lock there is in all of the kernel.
>
> Do you know code paths you see tasklist_lock being hot?

It's generally not bad enough to show up on single-socket machines.

But the problem with tasklist_lock is that it's one of our remaining
completely global locks. So it scales like sh*t in some circumstances.

On single-socket machines, most of the truly nasty hot paths aren't a
huge problem, because they tend to be mostly readers. So you get the
cacheline bounce, but you don't (usually) get much busy looping. The
cacheline bounce is "almost free" on a single socket.

But because it's one of those completely global locks, on big
multi-socket machines people have reported it as a problem forever.
Even just readers can cause problems (because of the cacheline
bouncing even when you just do the reader increment), but you also end
up having more issues with writers scaling badly.

Don't get me wrong - you can get bad scaling on other locks too, even
when they aren't really global - we had that with just the reference
counter increment for the user signal accounting, after all. Neither
of the reference counts were actually global, but they were just
effectively single counters under that particular load (ie the count
was per-user, but the load ran as a single user).

The reason tasklist_lock probably doesn't come up very much is that
it's _always_ been expensive. It has also caused some fundamental
issues (I think it's the main reason we have that rule that
reader-writer locks are unfair to readers, because we have readers
from interrupt context too, but can't afford to make normal readers
disable interrupts).

A lot of the tasklist lock readers end up looping quite a bit inside
the lock (looping over threads etc), which is why it can then be a big
deal when the rare reader shows up.

We've improved a _lot_ of those loops. That has definitely helped for
the common cases. But we've never been able to really fix the lock
itself.

                 Linus
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.