Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 02 Jun 2023 10:28:09 +0300
From: Alexey Izbyshev <izbyshev@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] [RFC] make clone() usable

On 2023-06-01 23:08, Rich Felker wrote:
> On Thu, Jun 01, 2023 at 01:12:57PM +0300, Alexey Izbyshev wrote:
>> On 2023-05-31 02:35, Rich Felker wrote:
>> >As discussed before (see the 2021 thread "Incorrect thread TID
>> >caching") clone() has been effectively unusable because it produces a
>> >child process in invalid state, with wrong tid in its thread
>> >structure, among other problems.
>> >
>> >The attached proposed patch attempts to make clone() usable by having
>> >it share the _Fork logic for establishing a consistent process state
>> >after forking, and also blocks use of flags which produce invalid
>> >state.
>> >
>> Tangentially, while thinking about this, I realized that because the
>> kernel clears the address of the exit futex on clone and _Fork
>> doesn't re-establish it, the following blocks forever:
>> 
>> void *thr(void *arg) {
>>         // Blocks in __tl_sync because __thread_list_lock is never
>> unlocked
>>         pthread_join(*(pthread_t *)arg, NULL));
>>         return NULL;
>> }
>> 
>> int main() {
>>         if (!fork()) {
>>                 static pthread_t pt;
>>                 pt = pthread_self();
>>                 pthread_create(&(pthread_t){0}, NULL, thr, &pt);
>>                 pthread_exit(NULL);
>>         }
>>         wait(NULL);
>> }
>> 
>> This could be fixed by the following change in _Fork.c:
>> 
>> -               self->tid = __syscall(SYS_gettid);
>> +               self->tid = __syscall(SYS_set_tid_address,
>> &__thread_list_lock);
> 
> Wow, I had no idea Linux's fork cleared the exit futex address. That's
> a big bug we've had around for nobody to have noticed...
> 
>> >With CLONE_VM, the old behavior is kept, with a caveat that the child
>> >context is extremely restrictive, ala vfork.
>> >
>> >It was raised previously (pardon the pun) that raise() should perhaps
>> >be modified to make a SYS_gettid syscall rather than using the thread
>> >structure tid, so that it's possible to call raise() in a vfork
>> >context without signaling the wrong process. It might make sense to do
>> >that now too.
>> >
>> Yes, it would be nice to have raise and abort (which also reads the
>> cached tid separately) working in a vfork child.
>> 
>> >diff --git a/src/linux/clone.c b/src/linux/clone.c
>> >index 8c1af7d3..e8f76bbc 100644
>> >--- a/src/linux/clone.c
>> >+++ b/src/linux/clone.c
>> >@@ -4,18 +4,55 @@
>> > #include <sched.h>
>> > #include "pthread_impl.h"
>> > #include "syscall.h"
>> >+#include "lock.h"
>> >+#include "fork_impl.h"
>> >+
>> >+struct clone_start_args {
>> >+	int (*func)(void *);
>> >+	void *arg;
>> >+	sigset_t sigmask;
>> >+};
>> >+
>> >+static int clone_start(void *arg)
>> >+{
>> >+	struct clone_start_args *csa = arg;
>> >+	__post_Fork(0);
>> >+	__restore_sigs(&csa->sigmask);
>> >+	return csa->func(csa->arg);
>> >+}
>> >
>> > int clone(int (*func)(void *), void *stack, int flags, void *arg, ...)
>> > {
>> >+	struct clone_start_args csa;
>> > 	va_list ap;
>> >-	pid_t *ptid, *ctid;
>> >-	void  *tls;
>> >+	pid_t *ptid = 0, *ctid = 0;
>> >+	void  *tls = 0;
>> >+
>> >+	/* Flags that produce an invalid thread/TLS state are disallowed. */
>> >+	int badflags = CLONE_THREAD | CLONE_SETTLS | CLONE_CHILD_CLEARTID;
>> >+	if (flags & badflags)
>> >+		return __syscall_ret(-EINVAL);
>> >
>> Maybe also check for NULL stack, as proposed in
>> https://www.openwall.com/lists/musl/2022/08/02/1 ?
> 
> I guess that can be done. Normally we don't check for null pointers,
> but I think the syscall would silently keep the existing stack pointer
> if passed null rather than blowing up, which we probably don't want --
> even without CLONE_VM, where it'd blow up, we'd be giving
> appears-to-work behavior without a contract for it to work.
> 
Yes, and this is not only about CLONE_VM. On some arches (e.g. i386, 
x86_64, aarch64) musl's __clone stores things on "stack" and hence 
segfaults if "stack" is NULL, but on other arches it doesn't (e.g. arm). 
Exposing this is inconsistent and confusing. If support for passing NULL 
"stack" is ever desired, __clone should be changed first.

>> > 	va_start(ap, arg);
>> >-	ptid = va_arg(ap, pid_t *);
>> >-	tls  = va_arg(ap, void *);
>> >-	ctid = va_arg(ap, pid_t *);
>> >+	if (flags & (CLONE_PIDFD | CLONE_PARENT_SETTID | CLONE_CHILD_SETTID))
>> >+	 	ptid = va_arg(ap, pid_t *);
>> >+	if (flags & CLONE_CHILD_SETTID) {
>> >+		tls = va_arg(ap, void *);
>> >+		ctid = va_arg(ap, pid_t *);
>> >+	}
>> > 	va_end(ap);
>> >
>> >-	return __syscall_ret(__clone(func, stack, flags, arg, ptid, tls,
>> >ctid));
>> >+	if (flags & CLONE_VM)
>> >+		return __syscall_ret(__clone(func, stack, flags, arg, ptid,
>> >tls, ctid));
>> >+
>> >+	__block_all_sigs(&csa.sigmask);
>> >+	LOCK(__abort_lock);
>> >+
>> >+	csa.func = func;
>> >+	csa.arg = arg;
>> >+	int ret = __clone(clone_start, stack, flags, &csa, ptid, tls, ctid);
>> >+
>> >+	__post_Fork(ret);
>> >+	__restore_sigs(&csa.sigmask);
>> >+	return __syscall_ret(ret);
>> > }
>> 
>> Another thing that might be somewhat relevant if we expect to see
>> wider usage of clone is that syscall (the public function) currently
>> blindly extracts six arguments via va_arg, which on some
>> architectures requires sufficient stack space not to crash. So on
>> i386 the following silly function will crash if passed to clone,
>> provided that "stack" points to the beginning of unmapped space past
>> the stack page end.
> 
> That's changed in the patch. It only calls va_arg for arguments whose
> existence the flags implies.
> 
I was talking about va_arg usage in syscall() (in misc/syscall.c). 
__clone only aligns stack on 16 bytes (at least on arches that I 
checked), but doesn't reserve space. So a sufficiently unlucky 
user-provided function will crash when called from __clone in the child, 
as in my example.

Alexey

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.