Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 5 Apr 2015 16:23:14 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Resuming work on new semaphore

On Sun, Apr 05, 2015 at 11:03:34PM +0300, Alexander Monakov wrote:
> On Sun, 5 Apr 2015, Rich Felker wrote:
> > 1. Thread A enters sem_wait.
> > 2. Thread B observes thread A in sem_wait via failed sem_trywait.
> 
> Hm, I don't see how that can be achieved.  As a result I'm afraid I didn't
> fully understand your example.

Indeed I was wrong about that, so I agree the whole scenario may fall
apart. Only sem_getvalue could show this, and only if it returns -1
rather than 0. So returning negative values from sem_getvalue seems
like a very bad idea -- it puts difficult- or impossible-to-satisfy
additional constraints on the implementation.

> > > Well we can make sem_getvalue return val[0]+val[1] instead... ;)
> > 
> > That just makes the new implementation look like the old one, no? :-)
> 
> Can't be bad if it behaves the same but works a bit faster.
> Apropos, like I've said on IRC, looks like there's "semaphore uncertainty
> principle": that formal semaphore value is between val[0] and (val[0] +/-
> val[1]) (clamped to 0 as needed).  It seems you can either do your hack and
> pretend that there are never any waiters, or try to faithfully count waiters
> in sem_getvalue, but then also reveal that sometimes the implementation works
> by stealing a post.  I believe you could argue that the latter is explicitely
> disallowed by the spec.

Yes, I think I agree.

> By the way, I think there's an interesting interplay with cancellation.
> Consider the following.  Thread B does "return sem_wait(sem);". Thread A does:
> 
>   pthread_cancel(thread_B);
>   sem_post(sem);
>   sem_getvalue(sem);
> 
> If it observes semaphore value as 1 it follows that thread B has not become a
> waiter yet, and since it must have cancellation already pending, it may not
> consume the post.  And yet if thread B is already futex-waiting in sem_wait,
> consuming the post takes priority over acting on cancellation.  So if then
> thread A does
> 
>   pthread_join(thread_B);
>   sem_getvalue(sem);
> 
> and gets value of 0, it sees a contradiction.  And return value from
> pthread_join will indicate that thread_B exited normally rather than was
> cancelled.

So the contradiction you claim exists is that cancellation happened
before the post, and thus thread B can't act on the post when it
didn't act on cancellation? I don't think that follows from the rules
of cancellation. The relevant text is:

    "Whenever a thread has cancelability enabled and a cancellation
    request has been made with that thread as the target, and the
    thread then calls any function that is a cancellation point (such
    as pthread_testcancel() or read()), the cancellation request shall
    be acted upon before the function."

So if cancellation was pending _before_ the call to sem_wait, then
sem_wait has to honor it. But there is no requirement that entry to
the sem_wait function be "atomic" with becoming a waiter on the
semaphore, and of course this is impossible to satisfy or even
specify. So it's totally legal to have the sequence:

1. Thread B enters sem_wait.
2. Thread B observes that cancellation was not already pending.
3. Thread A sends cancellation request.
4. Thread A sends post.
5. Thread B receives both, and chooses to act on the post per this
    text:

    "It is unspecified whether the cancellation request is acted upon
    or whether the cancellation request remains pending and the thread
    resumes normal execution if:

    - The thread is suspended at a cancellation point and the event for
    which it is waiting occurs

    - A specified timeout expired

    before the cancellation request is acted upon."

Here, the event for which it was waiting (the post) clearly occurs.

> And on the contrary, if you make acting on cancellation/timeout take priority,
> you can observe semaphore value increasing when waiters leave the wait on
> error path without consuming the post.

Yes obviously that is not possible.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.