musl - Re: Resuming work on new semaphore

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150402231457.GC6817@brightrain.aerifal.cx>
Date: Thu, 2 Apr 2015 19:14:57 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Resuming work on new semaphore

On Fri, Apr 03, 2015 at 12:39:10AM +0300, Alexander Monakov wrote:
> On Thu, 2 Apr 2015, Rich Felker wrote:
> > > Interesting.  To examine the issue under a different light, consider that from
> > > the perspective of semaphore implementation, waiters that were killed,
> > > stopped, or pre-empted forever in the middle of sem_wait are
> > > indistinguishable.
> > 
> > Yes, I noticed this too. In that sense, theoretically there should be
> > no harm (aside from eventual overflow of pending wake counter) from
> > having asynchronously-killed waiters, assuming the implementation is
> > bug-free in the absence of async killing of waiters.
> 
> Did you mean "presence"?  I'm having trouble understanding your phrase,
> especially after "assuming ..."; can you elaborate or rephrase?

I meant to say assuming that there aren't already any bugs, by your
reasoning adding async killing of waiters cannot add bugs (except the
overflow) since they're equivalent to a situation that arises without
async killing.

> That waiters can die breaks an assumption that operations on val[0] and val[1]
> do not under/overflow due to their range exceeding the number of
> simultaneously live tasks.

Right. I'm ignoring that one. The current implementation likewise has
that issue for the waiter count (but it could avoid it by saturating
the waiter count at INT_MAX I suppose, or by throwing away the waiter
count and just using a potential-waiters flag).

> > > Thus, subsequent sem_wait succeeds by effectively stealing
> > > a post, and to make things consistent you can teach sem_trywait to steal posts
> > > too (i.e. try atomic-decrement-if-positive val[1] just before returning
> > > EAGAIN, return 0 if that succeeds).
> > 
> > Hmm, perhaps that is valid. I'll have to think about it again. I was
> > thinking of having sem_trywait unconditionally down the value (val[0])
> > then immitate the exit path of sem_timedwait, but that's not valid
> > because another waiter could race and prevent sem_trywait from ever
> > being able to exit. But if it only does the down as a dec-if-positive
> > then it seems like it can safely dec-if-positive the wake count before
> > reporting failure.
> 
> I think my proposition above needs at least the following correction: when
> trywait succeeds in stealing a post by dec-if-positive(val[1]), it should also
> decrement val[0] before returning.

Yes, that seems right.

> Are you sure your proposition is invalid?  I don't think so.  How is trywait
> different from a timedwait with a timeout that immediately expires?  That is
> basically what your scheme should do.

Indeed, I think you're right. Conceptually trywait and timedwait with
zero timeout should be identical modulo error value and cancellation.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.