musl - More thoughts on wrapping signal handling

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201029063448.GK534@brightrain.aerifal.cx>
Date: Thu, 29 Oct 2020 02:34:50 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: More thoughts on wrapping signal handling

In "Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275)."
(20201010002612.GC17637@...ghtrain.aerifal.cx,
https://www.openwall.com/lists/musl/2020/10/10/1) I raised the
longstanding thought of having libc wrap signal handling. This is a
little bit of a big hammer for what it was proposed for -- fixing an
extremely-rare race between abort and execve -- but today I had a
thought about another use of it that's really compelling.

What I noted before was that, by wrapping signal handlers, libc could
implement a sort of "rollback" to restart a critical section that was
interrupted. However this really only has any use when the critical
section has no side effects aside from its final completion, and
except for execve where replacement of the process gives the atomic
cutoff for rollback, it requires __cp_end-like asm label of the end of
the critical section. So it's of limited utility.

However, what's more interesting than restarting the critical section
when a signal is received is *allowing it to complete* before handling
the signal. This can be implemented by having the wrapper, upon seeing
that it interrupted a critical section, save the siginfo_t in TLS and
immediately return, leaving signals blocked, without executing the
application-installed signal handler. Then, when leaving the critical
section, the unlock function can see the saved siginfo_t and call the
application's signal handler. Effectively, it's as if the signal were
just blocked until the end of the critical section.

What is the value in this?

1. It eliminates the need for syscalls to mask and unmask signals
   around all existing AS-safe locks and critical sections that can't
   safely be interrupted by application code.

2. It makes it so we can make almost any function that was AS-unsafe
   due to locking AS-safe, without any added cost. Even malloc can be
   AS-safe.

3. It makes it so a signal handler that fails to return promptly in
   one thread can't arbitrarily delay other threads waiting for
   libc-internal locks, because application code never interrupts our
   internal critical sections.

This last property, #3, is the really exciting one -- it means that,
short of swapping etc. (e.g. with mlockall and other realtime measures
taken) most libc locks can be considered as held only for very small
bounded time, rather than potentially-unbounded due to interruption by
signal.

I'm not sure if this is something worth pursuing, and certainly not in
the immediate future, but it is sounding more appealing.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.