Date: Thu, 9 Apr 2015 23:31:54 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Cc: libc-alpha@...rceware.org Subject: Inherent race condition in linux robust_list system While working on some of the code handling robust_list for robust (and other owner-tracked) mutexes in musl, I've come across a race condition that's inherent in the kernel's design for robust_list. There is no way to eliminate it with the current API, and I see no way to eliminate it without requiring a syscall to unlock robust mutexes. The procedure for unlocking a robust_list tracked mutex looks like this: 1. Store the address of the mutex to be unlocked in the robust_list "pending" slot. 2. Remove the mutex from the robust_list linked list. 3. Unlock the mutex. 4. Clear the "pending" slot in the robust_list. The purpose of the pending slot is so that the kernel can handle the case where a process dies asynchronously after removing the mutex from the linked list but before it's unlocked; in this case it treats the mutex like it's still in the list. But the kernel has no way of knowing whether such asynchronous process death occurs before or after step 3; it only knows it occurs between steps 2 and 4. This is very bad. As soon as step 3 takes place, another process can take ownership of the mutex, and if it knows it's the last user, it can unlock and destroy the mutex and then reuse the same memory for a new purpose (imagine a shared-memory heap managed by a malloc-like allocator, which would be a good application for robust mutexes). Now, if the new use happens to store a value matching the tid of the thread whose process is dying at the offset where the mutex owner would be stored, the kernel misinterprets the new data stored there as a mutex belonging to the dying process, and happily proceeds to corrupt it! Fixing this does not look easy. The obvious way is to make clearing the pending slot of the robust_list effectively atomic with unlocking the mutex by doing them together in a (futex) syscall, but that would require a syscall every time a robust mutex is unlocked. An alternate approach would be enlarging the robust_list to have a PC range during which the pending slot is valid. This would avoid a syscall but would require the atomic unlock to be performed in asm (to provide labels for the PC range). I do not see any way to fix it without kernel changes. Please note that this issue is distinct from glibc bug #14485, which is easily fixable and does not affect musl. The issue I'm describing here is much harder to fix because it's legal reuse of the same shared memory mapping the robust mutex existed in rather than reuse of the same virtual address range for a new mapping. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.