musl - Re: Calling setxid() in a vfork()-child

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201012145549.GG17637@brightrain.aerifal.cx>
Date: Mon, 12 Oct 2020 10:55:50 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Calling setxid() in a vfork()-child

On Mon, Oct 12, 2020 at 12:27:44PM +0300, Alexey Izbyshev wrote:
> Hello,
> 
> I'm investigating possibility of using vfork() instead of fork() in
> a Linux-only application. Before calling execve(), the app might
> need to call some functions to setup the child, including setxid()
> (let's assume that security concerns of [1] are not applicable). I'm
> aware that POSIX doesn't allow that for vfork()-children, but I'm
> also aware that it might be OK on Linux if the set of functions is
> sufficiently constrained, and that vfork() is used to efficiently
> implement posix_spawn() in C libraries. However, setuid()/setgid()
> seem particularly tricky because of the need to call the actual
> syscall in all threads, so if a C library is unaware that setxid()
> is called in a vfork()-child, it might attempt to interact with
> threads of the parent process, potentially causing trouble. I've
> checked musl and found a recent commit[2] that fixes this exact
> issue. I've also checked glibc[3], but haven't found any handling of
> this case (and vfork() doesn't appear to do anything special in this
> regard either[4]).
> 
> Do I understand correctly that, from an application developer
> perspective, it's currently better to avoid setxid/setrlimit libc
> functions in a vfork()-child, and that using syscall() or avoiding
> vfork() entirely is preferred in this case?

Really, avoiding vfork entirely is preferable. The traditional
specification of vfork (before it was deprecated and removed from
spec; POSIX has not had vfork for a *long* time) did not allow
*anything* after vfork except execve or _exit, so arguably it's UB,
although there's also some argument to be made that if we're
implementing the nonstandard and traditional vfork function it should
have most of the important traditional properties.

Indeed as you found this is fixed in musl, largely because the failure
mode was so egregiously bad.

Note that in addition to the issue you're asking about, it's
fundamentally a bad idea to be using set*id() in a vforked child (or
anywhere in a process that calls vfork) because it leaves moments
where there are tasks in different privilege domains executing from
the same VM space. If the task that's dropped privileges does anything
that could lead to an attacker seizing control of the flow of
execution, rather than just getting access to the set*id()-reduced
privilege domain, they have full access to the original privilege
domain. This is why musl's multithreaded set*id() (__synccall) takes
care not to admit forward progress of any application code during the
transition, and goes to the trouble of having a thread list lock that
unlocks atomically with kernel task exit so that there is no race
window where a still-live thread can be missed.

In any case, IMO unless you're programming for NOMMU compatibility,
you should just forget vfork ever existed. There's no good reason to
use it. If a process can't fork because it's too big or the fork would
impact performance too much, posix_spawn can do far more than
vfork+execve can do portably. It can't do everything you can do with
vfork+execve if you're willing to break portability rules (i.e. invoke
UB), but with a helper executable to run in the child you can get that
all back.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.