Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 14 Dec 2022 09:49:26 +0300
From: Alexey Izbyshev <izbyshev@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] mq_notify: fix close/recv race on failure path

On 2022-12-14 05:26, Rich Felker wrote:
> On Wed, Nov 09, 2022 at 01:46:13PM +0300, Alexey Izbyshev wrote:
>> In case of failure mq_notify closes the socket immediately after
>> sending a cancellation request to the worker thread that is going to
>> call or have already called recv on that socket. Even if we don't
>> consider the kernel behavior when the only descriptor to an object 
>> that
>> is being used in a system call is closed, if the socket descriptor is
>> closed before the kernel looks at it, another thread could open a
>> descriptor with the same value in the meantime, resulting in recv
>> acting on a wrong object.
>> 
>> Fix the race by moving pthread_cancel call before the barrier wait to
>> guarantee that the cancellation flag is set before the worker thread
>> enters recv.
>> ---
>> Other ways to fix this:
>> 
>> * Remove the racing close call from mq_notify and surround recv
>>   with pthread_cleanup_push/pop.
>> 
>> * Make the worker thread joinable initially, join it before closing
>>   the socket on the failure path, and detach it on the happy path.
>>   This would also require disabling cancellation around join/detach
>>   to ensure that mq_notify itself is not cancelled in an inappropriate
>>   state.
> 
> I'd put this aside for a while because of the pthread barrier
> involvement I kinda didn't want to deal with. The fix you have sounds
> like it works, but I think I'd rather pursue one of the other
> approaches, probably the joinable thread one.
> 
> At present, the implementation of barriers seems to be buggy (I need
> to dig back up the post about that), and they're also a really
> expensive synchronization tool that goes both directions where we
> really only need one direction (notifying the caller we're done
> consuming the args). I'd rather switch to a semaphore, which is the
> lightest and most idiomatic (at least per present-day musl idioms) way
> to do this.
> 
This sounds good to me. The same approach can also be used in 
timer_create (assuming it's acceptable to add dependency on 
pthread_cancel to that code).

> Using a joinable thread also lets us ensure we don't leave around
> threads that are waiting to be scheduled just to exit on failure
> return. Depending on scheduling attributes, this probably could be
> bad.
> 
I also prefer this approach, though mostly for aesthetic reasons (I 
haven't thought about the scheduling behavior). I didn't use it only 
because I felt it's a "logically larger" change than simply moving the 
pthread_barrier_wait call. And I wasn't aware that barriers are buggy in 
musl.

Thanks,
Alexey

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.