Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 12 Aug 2014 19:33:10 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: bug in pthread_cond_broadcast

On Wed, Aug 13, 2014 at 12:50:19AM +0200, Jens Gustedt wrote:
> Am Dienstag, den 12.08.2014, 17:20 -0400 schrieb Rich Felker:
> > On Tue, Aug 12, 2014 at 08:18:59PM +0200, Jens Gustedt wrote:
> > > so yes, the broadcast operation is synchronized with all other
> > > threads, that's the idea of this test
> > 
> > OK, thanks for clarifying. I'm still trying to understand where the
> > error in musl's accounting is happening --
> 
> I think these are the decrement operations on _c_waiters and
> _c_waiters2 in "unwait". In the case of the test, for pending waiters,
> these happen after other threads know that the condition variable is
> "released" by the main thread and have already entered the next phase.

I think you're saying the problem is this code:

        if (c->_c_waiters2) c->_c_waiters2--;
        else a_dec(&m->_m_waiters);

which is wrongly decrementing a newly-added waiter from the cond var
when the intent was that the waiter should be decremented from the
mutex. Is this correct?

One potential solution I have in mind is to get rid of this complex
waiter accounting by:

1. Having pthread_cond_broadcast set the new-waiter state on the mutex
   when requeuing, so that the next unlock forces a futex wake that
   otherwise might not happen.

2. Having pthread_cond_timedwait set the new-waiter state on the mutex
   after relocking it, either unconditionally (this would be rather
   expensive) or on some condition. One possible condition would be to
   keep a counter in the condvar object for the number of waiters that
   have been requeued, incrementing it by the number requeued at
   broadcast time and decrementing it on each wake. However the latter
   requires accessing the cond var memory in the return path for wait,
   and I don't see any good way around this. Maybe there's a way to
   use memory on the waiters' stacks?

> > I'd like to find a fix that
> > would be acceptable in the 1.0.x branch and make that fix before
> > possibly re-doing the cond var implementation (a fix which wouldn't be
> > suitable for backporting).
> 
> Some thoughts:
> 
> Basically, in "unwait" there shouldn't be any reference to c-> .  No
> pending thread inside timedwait should ever have to access the
> pthread_cond_t, again, it might already heavily used by other threads.

I'm not sure whether I agree with this or not -- as long as destroy is
properly synchronized, I don't think it's inherently a bug to access
c-> here, and I'm not clear yet on how the access could be eliminated.
I think this is a direction we should pursue too, but I'd like to
avoid invasive design changes for the backport to 1.0.x.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.