|
|
Message-ID: <aPkTPXA9M4FUyQGx@voyager> Date: Wed, 22 Oct 2025 19:24:13 +0200 From: Markus Wichmann <nullplan@....net> To: musl@...ts.openwall.com Subject: Deadlock in aio_cancel() Good evening, the AIO worker thread has to wait for preceding writes on the same FD to finish before commencing its own work if the requested operation is a sync or a write to an FD in append mode. It currently does so by calling pthread_cond_wait(). pthread_cond_wait() is a cancel point, but has to re-acquire the mutex even if cancelled. So all in all, the AIO worker thread will re-acquire the queue lock if it is cancelled while waiting for preceding writes to complete, and that happens before it can run cleanup(). aio_cancel() will take the queue lock and cancel each thread with a nonzero running flag. But it waits for the thread to actually finish with a futex, so it selfishly hogs the queue lock. Since that means that the AIO worker thread tries to get the queue lock while the its holder, the thread that called aio_cancel(), waits for the AIO worker to finish, we have a deadlock situation. The only solution I see is for aio_cancel() to give up the queue lock while it waits for the thread to finish. However, that presents a new problem: Now aio_cancel() has to restart the list iteration when it has re-acquired the lock, since the list is not necessarily unchanged afterward. And now I am wondering if this isn't possibly an infinite loop under some circumstances. BTW, a secondary issue with this is that even if pthread_cond_wait() managed to re-acquire the mutex, cleanup() will then deadlock again when it takes the queue lock that the thread is already holding. But that could be solved by adding another cleanup handler that releases the queue lock if cancellation happens while waiting for writers. Ciao, Markus
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.