Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 14 Dec 2022 13:29:54 +0300
From: Alexey Izbyshev <izbyshev@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: sem_post() can miss waiters

On 2022-12-14 04:48, Rich Felker wrote:
> On Tue, Nov 22, 2022 at 08:41:44AM +0300, Alexey Izbyshev wrote:
>> On 2022-11-22 03:09, Rich Felker wrote:
>> >If I understand correctly, the cost of the first option is generally
>> >an extra "last" broadcast wake that shouldn't be needed. Is that
>> >right?
>> >
>> >For example, if sem starts with count 0 and thread A calls wait, then
>> >thread B calls post twice, both posts make a syscall even though only
>> >the first one should have.
>> >
>> Yes, exactly.
>> 
>> >What if you instead perform the broadcast wake when the observed
>> >waiters count is <=1 rather than ==0? This should have no cost in the
>> >common case, but in the race case, I think it forces any hiding
>> >(just-arrived) extra waiters to wake, fail, and re-publish their
>> >waiting status to the waiters bit.
>> >
>> Indeed, I think this solves the overhead issue quite nicely, thanks.
>> So sem_post wake up logic would basically boil down to this:
>> 
>> * WAITERS_BIT is set and waiters > 1: don't reset WAITERS_BIT since
>> it's likely that some waiters will remain (and it's always fine to
>> err on the side of preserving the WAITERS_BIT); wake a single
>> waiter.
>> 
>> * WAITERS_BIT is set and waiters <= 1: reset WAITERS_BIT and wake
>> all waiters. In non-racy cases only a single waiter will be woken.
>> 
>> * WAITERS_BIT is unset: don't wake anybody. Even if there are some
>> waiters, another sem_post (that reset the WAITERS_BIT) is
>> responsible for waking them.
>> 
>> In code:
>> 
>> int val, new, waiters, priv = sem->__val[2];
>> do {
>>     val = sem->__val[0];
>>     waiters = sem->__val[1];
>>     if ((val & SEM_VALUE_MAX) == SEM_VALUE_MAX) {
>>         errno = EOVERFLOW;
>>         return -1;
>>     }
>>     new = val + 1;
>>     if (waiters <= 1)
>>         new &= ~0x80000000;
>> } while (a_cas(sem->__val, val, new) != val);
>> if (val<0) __wake(sem->__val, waiters <= 1 ? -1 : 1, priv);
> 
> Yes, this looks good to me. How is the attached patch?
> 
The patch looks good to me.

>> >I think there's another really stupid solution too that I would even
>> >consider, partly motivated by the fact that, with long-lived
>> >process-shared semaphores, the waiters count can become stale if
>> >waiters are killed. (Note: semaphores aren't required to be robust
>> >against this, but it's ugly that they're not.) This solution is just
>> >to keep the current code, but drop the waiters count entirely, and use
>> >broadcast wakes for all wakes. Then, any still-live waiters are
>> >*always* responsible for re-asserting their waiting status when all
>> >but one fail to acquire the semaphore after a wake. Of course this is
>> >a thundering herd, which is arguably something we should not want, but
>> >we accept it for correctness in several other places like condvars...
>> >
>> I'm not sure that the kind of "partial robustness" that could be
>> achieved by always waking all waiters is worth punishing normal
>> cases. Also, I don't think waiters count being stale is an issue per
>> se because it can be wrong only in one way (greater than the real
>> count of waiters), assuming you don't mean overflow (but overflow
>> could be handled by simply pinning it at INT_MAX permanently). But
>> many other issues are possible if we allow killing processes at
>> arbitrary points, including sem_post not sending a wake up
>> notification after updating the semaphore value, or sem_wait
>> consuming a notification but not updating the value. Granted, some
>> of these issues may "self-heal" on a future sem_wait/sem_post (in
>> the sense that the semaphore will leave a forbidden state), but they
>> might still interfere with the program shutdown logic (e.g. if the
>> program expects to claim the semaphore N times at the end without
>> being aware that some consumers are dead), and, of course, with any
>> logic outside of semaphore manipulation. So it seems to me that
>> either the program already watches for death of processes it cares
>> about, or it's not even clear that allowing further progress (due to
>> a broadcast) is always more desirable than blocking.
> 
> Indeed, this seems like it's just a bad direction to go for the
> reasons you described. One thing I'd like to note here though, but not
> act on at this time:
> 
> When we were designing the semaphore long ago, there was a vague
> proposal to make the post operation responsible for removing waiter
> counts, rather than the waiter. It was thrown out because it didn't
> seem to work. But if we ever did have a reason to want to do this, I
> think it's possible now, since it's essentially just a "countdown to
> broadcast wake" with no constraint that it be an accurate count, and
> the new waiters bit is what really controls whether wakes happen.
> 
Indeed, now the waiters count is just a hint used for resetting the 
waiters bit.
However, its increments/decrements must still be balanced for it to 
remain useful, and I don't see an easy way to achieve that if sem_post 
is responsible for decrementing it.

One further note: after this bug report I'd been pointed to a semaphore 
redesign proposal from 2015 (starts in 
https://www.openwall.com/lists/musl/2015/02/27/1 after some off-list 
discussion, ends in April). In that design, the waiters count is stored 
in the semaphore word, and sem_post is indeed responsible for adjusting 
it. However, another counter ("wake count") is still needed to make the 
semaphore work.

Thanks,
Alexey

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.