Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <aLiIlbYcZ9es4ac3@voyager>
Date: Wed, 3 Sep 2025 20:27:33 +0200
From: Markus Wichmann <nullplan@....net>
To: musl@...ts.openwall.com
Subject: ABA problem in aio_suspend()

Hi all,

I think the current aio_suspend() implementation is susceptible to an
ABA problem in __aio_fut, causing missed completions. Currently,
__aio_fut is set to the ID of the first thread calling aio_suspend() and
finding more than one aiocb to wait for. Of course, that means the
*second* thread to do so doesn't change __aio_fut at all, and has no
confidence __aio_fut hasn't been changed in between checking it and
waiting for it.

I'm thinking of a scenario like this: Two threads are calling
aio_suspend() on disjoint sets of aiocbs. Both have more than one
operation to wait for. Thread 1 is successfully waiting in
__timedwait_cp(), so it has already set __aio_fut to 1.

Now thread 2 also joins the fray. It sees __aio_fut set to 1, so does
not change it. When thread 2 is just about to call __timedwait_cp(), it
gets suspended. Then one of thread 2's operations finishes. The AIO
worker sees the nonzero __aio_fut, sets it to zero, and causes a
broadcast wake on it. However, this only wakes thread 1, which quickly
sees that none of its operations finished, and so thread 1 sets
__aio_fut back to 1 and continues to wait.

Now thread 2 resumes. __timedwait_cp() sees __aio_fut set to 1, as was
expected, and so thread 2 goes to wait until timeout. Even though one of
its operations is already finished.

This might be remedied by turning __aio_fut into a counter with waiters
bit. All workers completing increment __aio_fut and clear the waiters
bit and cause a broadcast wake if the waiters bit was set. aio_suspend()
reads the current counter *before* checking the operations, then sets
the waiters bit (immediately rechecking the operations if that failed).
This is in theory still susceptible to an ABA problem, but now a thread
would have to sleep through 2 billion updates of __aio_fut to be
affected. I think that is an acceptable risk.

Ciao,
Markus

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.