Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 01 Jun 2023 13:12:57 +0300
From: Alexey Izbyshev <izbyshev@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] [RFC] make clone() usable

On 2023-05-31 02:35, Rich Felker wrote:
> As discussed before (see the 2021 thread "Incorrect thread TID
> caching") clone() has been effectively unusable because it produces a
> child process in invalid state, with wrong tid in its thread
> structure, among other problems.
> 
> The attached proposed patch attempts to make clone() usable by having
> it share the _Fork logic for establishing a consistent process state
> after forking, and also blocks use of flags which produce invalid
> state.
> 
Tangentially, while thinking about this, I realized that because the 
kernel clears the address of the exit futex on clone and _Fork doesn't 
re-establish it, the following blocks forever:

void *thr(void *arg) {
         // Blocks in __tl_sync because __thread_list_lock is never 
unlocked
         pthread_join(*(pthread_t *)arg, NULL));
         return NULL;
}

int main() {
         if (!fork()) {
                 static pthread_t pt;
                 pt = pthread_self();
                 pthread_create(&(pthread_t){0}, NULL, thr, &pt);
                 pthread_exit(NULL);
         }
         wait(NULL);
}

This could be fixed by the following change in _Fork.c:

-               self->tid = __syscall(SYS_gettid);
+               self->tid = __syscall(SYS_set_tid_address, 
&__thread_list_lock);

> With CLONE_VM, the old behavior is kept, with a caveat that the child
> context is extremely restrictive, ala vfork.
> 
> It was raised previously (pardon the pun) that raise() should perhaps
> be modified to make a SYS_gettid syscall rather than using the thread
> structure tid, so that it's possible to call raise() in a vfork
> context without signaling the wrong process. It might make sense to do
> that now too.
> 
Yes, it would be nice to have raise and abort (which also reads the 
cached tid separately) working in a vfork child.

> diff --git a/src/linux/clone.c b/src/linux/clone.c
> index 8c1af7d3..e8f76bbc 100644
> --- a/src/linux/clone.c
> +++ b/src/linux/clone.c
> @@ -4,18 +4,55 @@
>  #include <sched.h>
>  #include "pthread_impl.h"
>  #include "syscall.h"
> +#include "lock.h"
> +#include "fork_impl.h"
> +
> +struct clone_start_args {
> +	int (*func)(void *);
> +	void *arg;
> +	sigset_t sigmask;
> +};
> +
> +static int clone_start(void *arg)
> +{
> +	struct clone_start_args *csa = arg;
> +	__post_Fork(0);
> +	__restore_sigs(&csa->sigmask);
> +	return csa->func(csa->arg);
> +}
> 
>  int clone(int (*func)(void *), void *stack, int flags, void *arg, ...)
>  {
> +	struct clone_start_args csa;
>  	va_list ap;
> -	pid_t *ptid, *ctid;
> -	void  *tls;
> +	pid_t *ptid = 0, *ctid = 0;
> +	void  *tls = 0;
> +
> +	/* Flags that produce an invalid thread/TLS state are disallowed. */
> +	int badflags = CLONE_THREAD | CLONE_SETTLS | CLONE_CHILD_CLEARTID;
> +	if (flags & badflags)
> +		return __syscall_ret(-EINVAL);
> 
Maybe also check for NULL stack, as proposed in 
https://www.openwall.com/lists/musl/2022/08/02/1 ?

>  	va_start(ap, arg);
> -	ptid = va_arg(ap, pid_t *);
> -	tls  = va_arg(ap, void *);
> -	ctid = va_arg(ap, pid_t *);
> +	if (flags & (CLONE_PIDFD | CLONE_PARENT_SETTID | CLONE_CHILD_SETTID))
> +	 	ptid = va_arg(ap, pid_t *);
> +	if (flags & CLONE_CHILD_SETTID) {
> +		tls = va_arg(ap, void *);
> +		ctid = va_arg(ap, pid_t *);
> +	}
>  	va_end(ap);
> 
> -	return __syscall_ret(__clone(func, stack, flags, arg, ptid, tls, 
> ctid));
> +	if (flags & CLONE_VM)
> +		return __syscall_ret(__clone(func, stack, flags, arg, ptid, tls, 
> ctid));
> +
> +	__block_all_sigs(&csa.sigmask);
> +	LOCK(__abort_lock);
> +
> +	csa.func = func;
> +	csa.arg = arg;
> +	int ret = __clone(clone_start, stack, flags, &csa, ptid, tls, ctid);
> +
> +	__post_Fork(ret);
> +	__restore_sigs(&csa.sigmask);
> +	return __syscall_ret(ret);
>  }

Another thing that might be somewhat relevant if we expect to see wider 
usage of clone is that syscall (the public function) currently blindly 
extracts six arguments via va_arg, which on some architectures requires 
sufficient stack space not to crash. So on i386 the following silly 
function will crash if passed to clone, provided that "stack" points to 
the beginning of unmapped space past the stack page end.

int f(void *arg) {
         return syscall(SYS_gettid);
}

And it will crash even on x86-64 if tail call optimization triggers, 
which doesn't happen with current GCC due to long/int return type 
mismatch.

(This test crashes with glibc 2.37 as well.)

pthread_create masks this issue because it always consumes some stack 
space with struct start_args.

Thanks,
Alexey

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.