|
|
Message-ID: <878skmpcib.fsf_-_@x220.int.ebiederm.org>
Date: Fri, 28 Feb 2020 16:34:20 -0600
From: ebiederm@...ssion.com (Eric W. Biederman)
To: <linux-kernel@...r.kernel.org>
Cc: Al Viro <viro@...iv.linux.org.uk>, Kernel Hardening
<kernel-hardening@...ts.openwall.com>, Linux API
<linux-api@...r.kernel.org>, Linux FS Devel
<linux-fsdevel@...r.kernel.org>, Linux Security Module
<linux-security-module@...r.kernel.org>, Akinobu Mita
<akinobu.mita@...il.com>, Alexey Dobriyan <adobriyan@...il.com>, Andrew
Morton <akpm@...ux-foundation.org>, Andy Lutomirski <luto@...nel.org>,
Daniel Micay <danielmicay@...il.com>, Djalal Harouni <tixxdz@...il.com>,
"Dmitry V . Levin" <ldv@...linux.org>, Greg Kroah-Hartman
<gregkh@...uxfoundation.org>, Ingo Molnar <mingo@...nel.org>, "J . Bruce
Fields" <bfields@...ldses.org>, Jeff Layton <jlayton@...chiereds.net>,
Jonathan Corbet <corbet@....net>, Kees Cook <keescook@...omium.org>,
Oleg Nesterov <oleg@...hat.com>, Alexey Gladkov
<gladkov.alexey@...il.com>, Linus Torvalds
<torvalds@...ux-foundation.org>, Jeff Dike <jdike@...toit.com>, Richard
Weinberger <richard@....at>, Anton Ivanov
<anton.ivanov@...bridgegreys.com>
Subject: [PATCH 4/3] pid: Improve the comment about waiting in zap_pid_ns_processes
Oleg wrote a very informative comment, but with the removal of
proc_cleanup_work it is no longer accurate.
Rewrite the comment so that it only talks about the details
that are still relevant, and hopefully is a little clearer.
Signed-off-by: "Eric W. Biederman" <ebiederm@...ssion.com>
---
kernel/pid_namespace.c | 31 +++++++++++++++++++------------
1 file changed, 19 insertions(+), 12 deletions(-)
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 318fcc6ba301..01f8ba32cc0c 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -224,20 +224,27 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
} while (rc != -ECHILD);
/*
- * kernel_wait4() above can't reap the EXIT_DEAD children but we do not
- * really care, we could reparent them to the global init. We could
- * exit and reap ->child_reaper even if it is not the last thread in
- * this pid_ns, free_pid(pid_allocated == 0) calls proc_cleanup_work(),
- * pid_ns can not go away until proc_kill_sb() drops the reference.
+ * kernel_wait4() misses EXIT_DEAD children, and EXIT_ZOMBIE
+ * process whose parents processes are outside of the pid
+ * namespace. Such processes are created with setns()+fork().
*
- * But this ns can also have other tasks injected by setns()+fork().
- * Again, ignoring the user visible semantics we do not really need
- * to wait until they are all reaped, but they can be reparented to
- * us and thus we need to ensure that pid->child_reaper stays valid
- * until they all go away. See free_pid()->wake_up_process().
+ * If those EXIT_ZOMBIE processes are not reaped by their
+ * parents before their parents exit, they will be reparented
+ * to pid_ns->child_reaper. Thus pidns->child_reaper needs to
+ * stay valid until they all go away.
*
- * We rely on ignored SIGCHLD, an injected zombie must be autoreaped
- * if reparented.
+ * The code relies on the the pid_ns->child_reaper ignoring
+ * SIGCHILD to cause those EXIT_ZOMBIE processes to be
+ * autoreaped if reparented.
+ *
+ * Semantically it is also desirable to wait for EXIT_ZOMBIE
+ * processes before allowing the child_reaper to be reaped, as
+ * that gives the invariant that when the init process of a
+ * pid namespace is reaped all of the processes in the pid
+ * namespace are gone.
+ *
+ * Once all of the other tasks are gone from the pid_namespace
+ * free_pid() will awaken this task.
*/
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
--
2.20.1
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.