>From 9179864fa24ecdabd2d2aadddf4d937cfb824090 Mon Sep 17 00:00:00 2001 From: Rich Felker Date: Mon, 28 Sep 2020 17:30:04 -0400 Subject: [PATCH 3/3] trap, don't deadlock, on AS-unsafe operations after multithreaded fork POSIX specifies the forked child of a multithreaded process to run in an async signal context where it cannot call AS-unsafe functions. the underlying motivation for this restriction is that internal resources in the parent may have been owned by other threads, which do not exist in the child, at the moment of fork, and may not be possible to access safely. prior to commit e01b5939b38aea5ecbe41670643199825874b26c, the child simply skipped locking due to its being single-threaded, resulting in silent use of inconsistent state, possibly leading to crashes, deadlocks, or other kinds of misbehavior. since that change, any attempt by the child to take a lock that was held in another thread of the parent reliably deadlocks. this prevents unsafe forward progress, but is an unexpected failure mode and difficult to log and handle in automated build and testing workflows. instead, track in the child that the fork was by a multithreaded parent, and if so, trap before waiting on a lock. this only works for internal locks serviced by LOCK/__lock, so it's possible that some things using other mechanisms will still deadlock, but this should catch most of the common problems and assist in finding and fixing erroneous application code. one subtlety that almost makes this change wrong is that we use threads internally for POSIX AIO and for SIGEV_THREAD timers. the existence of these internal threads cannot impose on what the application can do in the child. these subsystems don't use the affected locks themselves, but it's possible that a process that's only "multithreaded as an implementation detail" in the parent could fork, then legitimately create multiple threads in the child, leading to lock contention in the child that is not a result of UB by the application. in this case, however, pthread_create will have reset need_locks to a positive value, disarming the trap. --- src/process/fork.c | 2 +- src/thread/__lock.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/src/process/fork.c b/src/process/fork.c index dbaa9402..538a0bcb 100644 --- a/src/process/fork.c +++ b/src/process/fork.c @@ -32,7 +32,7 @@ pid_t fork(void) self->next = self->prev = self; __thread_list_lock = 0; libc.threads_minus_1 = 0; - if (libc.need_locks) libc.need_locks = -1; + if (libc.need_locks > 0) libc.need_locks = -2; } __aio_atfork(!ret); __restore_sigs(&set); diff --git a/src/thread/__lock.c b/src/thread/__lock.c index 60eece49..75412cc8 100644 --- a/src/thread/__lock.c +++ b/src/thread/__lock.c @@ -24,6 +24,7 @@ void __lock(volatile int *l) int current = a_cas(l, 0, INT_MIN + 1); if (need_locks < 0) libc.need_locks = 0; if (!current) return; + if (need_locks < -1) a_crash(); /* A first spin loop, for medium congestion. */ for (unsigned i = 0; i < 10; ++i) { if (current < 0) current -= INT_MIN + 1; -- 2.21.0