Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 14 Aug 2020 20:07:26 -0600
From: Ariadne Conill <ariadne@...eferenced.org>
To: musl@...ts.openwall.com
Subject: Re: Restrictions on child context after multithreaded fork

Hello,

On 2020-08-14 15:41, Rich Felker wrote:
> musl 1.2.1 has exposed bugs in several applications and libraries
> caused by async-signal-unsafe code between (multithreaded) fork and
> subsequent exec. So far, dbus library code, pulseaudio library code,
> and libvirt have been found to be affected. A couple of the bug
> reports (with incomplete information) are:

I suspect the dbus library code has been broken for some amount of time 
because we've been tracking an issue where dbus stalls on KDE Wayland 
for a short period of time causing the session to come up strangely. 
1.2.1 did make this stall more pronounced though.

> https://gitlab.alpinelinux.org/alpine/aports/-/issues/11602
> https://gitlab.alpinelinux.org/alpine/aports/-/issues/11815
> 
> Fixing the affected library code looks very straightforward; it's just
> a matter of doing proper iterations of existing data/state rather than
> allocating lists and using opendir on procfs and such. I've discussed
> fixes with Alpine Linux folks and I believe fixes have been tested,
> but I don't see any patches in aports yet.

I'll work on getting patches in next week.

> I've seen suspicions that the switch to mallocng exposed this, but I'm
> pretty sure it was:
> 
> https://git.musl-libc.org/cgit/musl/commit/?id=e01b5939b38aea5ecbe41670643199825874b26c

We carried this patch before upgrade to 1.2.1 and it was mostly fine, 
has been carried since 1.1.24-r7 AFAIK.

> Before this commit, the (incorrect) lock skipping logic allowed the
> child process to access inconsistent state left from the parent if it
> violated the requirement not to call AS-unsafe functions. Now, the
> lock attempt in the child rightly deadlocks before accessing state
> that was being modified under control of the lock in the parent. This
> is not specific to malloc but common with anything using libc-internal
> locks.
> 
> I'll follow up on this thread once there are patches for the known
> affected libraries.
> 
> Note that this is a type of bug that's possibly hard to get upstreams
> to take seriously. libvirt in particular, despite having multiple
> comments throughout the source warning developers that they can't do
> anything AS-unsafe between fork and exec, is somehow deeming malloc an
> exception to that rule because they want to use it (despite it clearly
> not being necessary).
> 
> And the dbus issue has been known for a long time; see open bug:
> 
> https://gitlab.freedesktop.org/dbus/dbus/-/issues/173
> (originally: https://bugs.freedesktop.org/show_bug.cgi?id=100843)
> 
> This is largely because glibc attempts to make the erroneous usage by
> these libraries work (more on that below).
> 
> The next issue of POSIX (Issue 8) will drop the requirement that fork
> be AS-safe, as a result of Austin Group tracker issue #62. This makes
> the glibc behavior permissible/conforming, but there does not seem to
> be any effort on the POSIX side to drop the requirement on
> applications not to do AS-unsafe things in the child before exec, so
> regardless of this change, what these libraries are doing is still
> wrong.
> 
> In order to make the child environment unrestricted after fork, either
> fork must hold *all* locks at the time the actual fork syscall takes
> place, or it must be able to reset any state protected by a lock that
> was held in the parent (or some mix of the two). It's fundamentally
> impossible to do this completely (in a way that lets the child run
> unrestricted), since some locks in the parent may be held arbitrarily
> long such that fork waiting on them would deadlock. In particular, any
> stdio FILE lock may be held indefinitely because there's a blocking
> operation in progress on the underlying fd, or because the application
> has called flockfile. Thus, at best, the implementation can give the
> child an environment where fflush(0) and exit() still deadlock.
> 
> In case we do want to follow a direction of trying to provide some
> degree of relaxation of restrictions on the child (taking the liberty
> of POSIX-future drop of fork's AS-safety requirement), I did a quick
> survey of libc-internal locks, and found:

I think it is better to fix programs to not depend on AS-unsafe 
functions at fork time.  Being lax on this requirement is an indicator 
of other bad engineering decisions in these programs, especially libvirt 
and pulseaudio.

Ariadne

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.