musl - fork, set*id(synccall), cancellation -- nasty interaction

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130426172722.GA21854@brightrain.aerifal.cx>
Date: Fri, 26 Apr 2013 13:27:22 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: fork, set*id(synccall), cancellation -- nasty interaction

I've run across a nasty set of race conditions I'm trying to solve:

1. __synccall, needed for multi-threaded set*id, needs to obtain a
lock that prevents thread creation and other things. However, setuid
and setgid are specified to be async-signal-safe. They will presently
hang if called from a signal handler that interrupted pthread_create
or several other functions.

2. When forking, the calling thread's pid/tid change, and there is a
window of time during which the child process has the wrong pid/tid in
its thread descriptor. Considering that the next version of POSIX will
remove fork from the list of async-signal-safe functions, and glibc's
fork is already non-async-signal-safe, it seems that we could just go
ahead and consider it unsafe. There are only two async-signal-safe
code paths that care about the pid/tid in the thread descriptor:
__synccall (called from setuid/setgid) and the cancellation signal
handler (called asynchronously). We could prevent these from ever
seeing wrong values in the thread descriptor by blocking signals
before making the fork syscall and restoring them afterwards. This is
probably the correct solution. However, there is still at least one
other problem:

3. If thread B forks while thread A is in the middle of starting a
__synccall operation, pthread_create, or anything else dealing with
the lock mentioned in point 1 above, the child process will inherit an
inconstent state. Normally this would not be a big deal since
multi-threaded programs are forbidden from doing anything
async-signal-unsafe after forking. However, setuid and setgid are
specified to be async-signal-safe, but cannot handle the inconsistent
state. The case where they're called after fork returns in the child
is not a problem, since libc.threads_minus_1 will be 0 and the
__synccall logic will not be used, but if fork was called from a
signal handler that interrupted setuid() after it was determined that
__synccall is needed but before __synccall actually began, we still
hit the trouble case. Perhaps this can be solved though: if __synccall
first blocks all signals except its own internal signal (blocking its
own before taking the lock would lead to deadlock) then tests
libc.threads_minus_1 before proceeding, it could determine that the
process has switched to single-threaded mode and avoid the whole
__synccall logic. Due to signals being blocked, there is no way fork
could be called in this window.

The hardest problem seems to be #1, and the only immediate solution I
see is just making the operation fail if the lock is not available.
This technically makes it async-signal-safe, but it's an undesirable
failure case...

Anyway, I'm mainly posting this to have a good record of the issues,
but I would also be happy to hear any ideas towards fixing them.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.