musl - Re: pthread shouldn't ignore errors from syscall futex()

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a869a173-a8c7-0d82-41f8-0953b3db1068@yandex-team.ru>
Date: Wed, 20 May 2020 20:38:35 +0300
From: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To: Rich Felker <dalias@...c.org>, musl@...ts.openwall.com
Subject: Re: pthread shouldn't ignore errors from syscall futex()

On 20/05/2020 19.05, Rich Felker wrote:
> On Wed, May 20, 2020 at 03:31:46PM +0300, Konstantin Khlebnikov wrote:
>> Userspace implementations of mutexes (including glibc) in some cases
>> retries operation without checking error code from syscall futex.
>>
>> Example which loops inside second call rather than hung (or die) peacefully:
>>
>> #include <stdlib.h>
>> #include <pthread.h>
>>
>> int main(int argc, char **argv)
>> {
>> 	char buf[sizeof(pthread_mutex_t) + 1];
>> 	pthread_mutex_t *mutex = (pthread_mutex_t *)(buf + 1);
>>
>> 	pthread_mutex_init(mutex, NULL);
>> 	pthread_mutex_lock(mutex);
>> 	pthread_mutex_lock(mutex);
>> }
>>
>> Thread in lkml:
>> https://lore.kernel.org/lkml/158955700764.647498.18025770126733698386.stgit@buzz/T/
>>
>> Related bug in glibc:
>> https://sourceware.org/bugzilla/show_bug.cgi?id=25997
> 
> In general, this behavior is intentional. If running on a system where
> futexx is broken (incomplete implementation of Linux syscall API,
> Linux built with flags that break futex which is possible on some
> archs, etc.), or if the kernel cannot perform the wait because of an
> OOM condition in the kernel (Linux is *not* written to be resilent
> against OOM and it shows), the behavior degrades to spinlocks rather
> than crashing. Aborting the application because of OOM conditions in
> the kernel is simply not acceptable.

Yes, OOM condition in cgroup before linux 4.19 definitely could lead to
returning EFAULT by almost any syscall. This is worth to document in
futex manpage.

But EINVAL from futex() always meant arguments were wrong.

Ignoring unknown errors feels wrong anyway. That just hides bugs.
And provokes appearing these incomplete/buggy implementations of futex.

Also degrading silently to spin-locks isn't very safe.
Not all schedulers guarantee progress if waiter spins.
At least add some delay or yield into that fallback waiting loop.

> 
> It would be possible to try to distinguish the causes of futex failure
> and handle the unaligned case specially, but this would put more code
> in hot paths, impacting size and possibly performance in valid
> programs for the sake of catching a non-security bug in invalid ones.
> This does not seem like a useful tradeoff.

I've proposed to send SIGBUS from syscall when futex address is unligned.
(In LKML thread, see link above)

> 
> Assuming the buggy program actually calls pthread_mutex_init rather
> than just using an uninitialized/zero-initialized mutex object at
> misaligned address, pthread_mutex_init (and likewise other pthread
> object init functions) could possibly trap on the error (with no
> syscall, just looking for a misaligned address mod _Alignof() the
> object type) to catch it. I'm not sure if this is worthwhile though
> since, while being UB, it doesn't seem to be UB with any security
> impact.

Yeah, I'm worried more about debugability and CO2 emission =)

> 
> Rich
>

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.