musl - Re: pthread shouldn't ignore errors from syscall futex()

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200520160506.GL1079@brightrain.aerifal.cx>
Date: Wed, 20 May 2020 12:05:07 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Cc: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
Subject: Re: pthread shouldn't ignore errors from syscall futex()

On Wed, May 20, 2020 at 03:31:46PM +0300, Konstantin Khlebnikov wrote:
> Userspace implementations of mutexes (including glibc) in some cases
> retries operation without checking error code from syscall futex.
> 
> Example which loops inside second call rather than hung (or die) peacefully:
> 
> #include <stdlib.h>
> #include <pthread.h>
> 
> int main(int argc, char **argv)
> {
> 	char buf[sizeof(pthread_mutex_t) + 1];
> 	pthread_mutex_t *mutex = (pthread_mutex_t *)(buf + 1);
> 
> 	pthread_mutex_init(mutex, NULL);
> 	pthread_mutex_lock(mutex);
> 	pthread_mutex_lock(mutex);
> }
> 
> Thread in lkml:
> https://lore.kernel.org/lkml/158955700764.647498.18025770126733698386.stgit@buzz/T/
> 
> Related bug in glibc:
> https://sourceware.org/bugzilla/show_bug.cgi?id=25997

In general, this behavior is intentional. If running on a system where
futexx is broken (incomplete implementation of Linux syscall API,
Linux built with flags that break futex which is possible on some
archs, etc.), or if the kernel cannot perform the wait because of an
OOM condition in the kernel (Linux is *not* written to be resilent
against OOM and it shows), the behavior degrades to spinlocks rather
than crashing. Aborting the application because of OOM conditions in
the kernel is simply not acceptable.

It would be possible to try to distinguish the causes of futex failure
and handle the unaligned case specially, but this would put more code
in hot paths, impacting size and possibly performance in valid
programs for the sake of catching a non-security bug in invalid ones.
This does not seem like a useful tradeoff.

Assuming the buggy program actually calls pthread_mutex_init rather
than just using an uninitialized/zero-initialized mutex object at
misaligned address, pthread_mutex_init (and likewise other pthread
object init functions) could possibly trap on the error (with no
syscall, just looking for a misaligned address mod _Alignof() the
object type) to catch it. I'm not sure if this is worthwhile though
since, while being UB, it doesn't seem to be UB with any security
impact.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.