Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 2 May 2022 17:18:56 -0400
From: Rich Felker <>
To: Alexey Izbyshev <>
Subject: Re: vfork()-based posix_spawn() has more failure modes than
 fork()-based one

On Mon, May 02, 2022 at 10:26:36PM +0300, Alexey Izbyshev wrote:
> Hi,
> I was recently made aware via [1] that vfork() can have more failure
> modes than fork() on Linux. The only case I know about is due to
> Linux not allowing processes in different time namespaces to share
> address space, but probably there are or will be more. An example is
> below (requires Linux >= 5.6).
> $ cat test.c
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <spawn.h>
> #include <sys/wait.h>
> #include <unistd.h>
> int main(int argc, char *argv[], char *envp[]) {
>   if (getenv("TEST_FORK")) {
>     pid_t pid = fork();
>     if (pid < 0) {
>         perror("fork");
>         return 127;
>     }
>     if (pid == 0) {
>         execve(argv[1], argv + 1, envp);
>         _exit(127);
>     }
>   } else {
>       int err = posix_spawn(0, argv[1], 0, 0, argv + 1, envp);
>       if (err) {
>         printf("posix_spawn: %s\n", strerror(err));
>         return 127;
>       }
>   }
>   wait(NULL);
>   return 0;
> }
> $ musl-gcc test.c
> $ unshare -UrT ./a.out /bin/echo OK
> posix_spawn: Invalid argument
> $ TEST_FORK=1 unshare -UrT ./a.out /bin/echo OK
> OK
> A common expectation from applications is that they can use
> posix_spawn() as a drop-in replacement for fork()/exec() (when its
> child-tweaking features are sufficient), but this case breaks the
> expectation. Do you think it would make sense for musl to fallback
> to fork() in case vfork() fails in posix_spawn()?
> I've also opened a bug about this in glibc[2]. Maybe libcs could
> coordinate in how they handle this case.
> Alexey
> [1]
> [2]

I'm trying to understand how this comes to be. The child should
inherit the namespaces of the parent and thus should not be in a
different namespace that precludes spawn. I'm guessing this is some
oddity where unshare doesn't affect the process itself, only its
children? If so, it seems like a bug that it doesn't affect the
process itself after execve (after unshare(1) runs your test program),
but that probably can't be fixed now on the Linux side for stability
reasons. :(

For what it's worth, I feel like the answer here is really that you
can't expect everything (or anything) to work after you've created a
bad or inconsistent process state, which can be done in various ways
like using unshare(2) in certain ways a multithreaded process, certain
manual uses of clone(2), etc. Apparently unsharing time ns is one of
those things too, and if it behaves the way it seems to, I don't think
you can use it at all without an extra fork (adding -f to the
unshare(1) command line). Otherwise the top-level process in your
"container" and its children will be in different time namespaces,
which is not at all what you would want anyway.

We probably could make posix_spawn retry __clone without CLONE_VM if
if fails with certain errors, as long as those errors are
non-ambiguous about indicating a need for retry. I don't see EINVAL
documented as being possible for any cases that would need to be
treated as errors, but then again it doesn't seem to be documented for
this corner case you found either.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.