Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [day] [month] [year] [list]
Date: Sun, 16 Jun 2013 11:57:23 -0400
From: Rich Felker <>
Subject: Improving AIO implementation

The current AIO implementation in musl has some bugs (as Jens has
recently noticed) and limitations, and despite AIO being a rather ugly
and rarely-used set of interfaces, I think we should aim a bit higher
in quality still. The main issues I'm aware of are:

1. AIO is not synchronized with close. This bug is also present in
glibc; see
It is also very hard to fix, since close is required to be
async-signal-safe, but protecting AIO against its file descriptors
being closed and reopened is difficult to make async-signal-safe.
unshare(CLONE_FILES) seems to offer an approach to a solution, but
it's complicated by these issues:

A. The application-visible thread, if SIGEV_THREAD is used, would
still have to share file descriptors with the rest of the process, so
two threads would be needed for SIGEV_THREAD delivery rather than one.

B. The IO thread would have to find an inexpensive way of closing all
other file descriptors after unsharing, so as not to keep files
(mainly pipes, sockets, etc.) open after the application expected them
to be closed. However, this could interfere with fcntl locks, unless
unshare(CLONE_FILES) creates a new lock ownership context too.

Another possible solution is to dup the file descriptor for AIO, but
that also introduces an issue with fcntl locks: when the duplicate fd
is closed, locks would be lost.

2. There is no ordering between AIO operations on a given file
descriptor. Each AIO request is treated completely independently.
Based on my reading of XSH 2.8.2 Asynchronous I/O, the current
behavior seems at least borderline permissible (as long as we specify
the implementation-defined circumstances as being "at all times"), but
it's low-quality.

3. The aio_cancel function is not able to determine the correct value
if its aiocb pointer argument is a null pointer. This is because there
is no index of outstanding AIO operations on a given file descriptor.
The concept of such an index is even difficult with respect to the
close semantics in issue #1 above, and any solution based on unshare
would not help with implementing such indexing.

On the other hand, some things about musl's AIO are more-correct than
glibc's. For instance, aio_suspend is required to be
async-signal-safe, but glibc makes not effort to satisfy this
requirement ( I
cheated in musl by using an very inefficient form of waking for
aio_suspend, which is also something of a QoI issue.

Any thoughts on a direction for improving AIO? Based on the above
issues, I think we need to move to some model indexed by file
descriptor where close actually has to do the difficult work
(optimized-out in static linking via a weak symbol) of cancelling
pending AIO. Ordering of writes should be preserved except when they
are non-overlapping (i.e. a new write can start immediately except
when it overlaps with a pending AIO operation), and reads should be
unordered (immediately runnable) except that they are queued for the
fd when they overlap with an unfinished write.

As for implementing async-signal-safety, any AIO operations that
access or modify the index can block all signals to prevent close from
being called from a signal handler in the same thread while they are
in-progress. However, unconditionally blocking signals in close to
prevent AIO functions from running in a signal handler while close is
working with the AIO index would be unjustifiably costly. Instead I
would propose an atomic global counter of file descriptors in the AIO
index. Only if this count is nonzero would close need to block signals
and check the index.


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ