Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 11 Feb 2023 10:06:03 -0500
From: Rich Felker <dalias@...c.org>
To: Alexey Izbyshev <izbyshev@...ras.ru>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH] mq_notify: fix close/recv race on failure path

On Sat, Feb 11, 2023 at 05:45:14PM +0300, Alexey Izbyshev wrote:
> On 2023-02-10 19:29, Rich Felker wrote:
> >On Wed, Dec 14, 2022 at 09:49:26AM +0300, Alexey Izbyshev wrote:
> >>On 2022-12-14 05:26, Rich Felker wrote:
> >>>On Wed, Nov 09, 2022 at 01:46:13PM +0300, Alexey Izbyshev wrote:
> >>>>In case of failure mq_notify closes the socket immediately after
> >>>>sending a cancellation request to the worker thread that is going to
> >>>>call or have already called recv on that socket. Even if we don't
> >>>>consider the kernel behavior when the only descriptor to an
> >>>>object that
> >>>>is being used in a system call is closed, if the socket descriptor is
> >>>>closed before the kernel looks at it, another thread could open a
> >>>>descriptor with the same value in the meantime, resulting in recv
> >>>>acting on a wrong object.
> >>>>
> >>>>Fix the race by moving pthread_cancel call before the barrier wait to
> >>>>guarantee that the cancellation flag is set before the worker thread
> >>>>enters recv.
> >>>>---
> >>>>Other ways to fix this:
> >>>>
> >>>>* Remove the racing close call from mq_notify and surround recv
> >>>>  with pthread_cleanup_push/pop.
> >>>>
> >>>>* Make the worker thread joinable initially, join it before closing
> >>>>  the socket on the failure path, and detach it on the happy path.
> >>>>  This would also require disabling cancellation around join/detach
> >>>>  to ensure that mq_notify itself is not cancelled in an inappropriate
> >>>>  state.
> >>>
> >>>I'd put this aside for a while because of the pthread barrier
> >>>involvement I kinda didn't want to deal with. The fix you have sounds
> >>>like it works, but I think I'd rather pursue one of the other
> >>>approaches, probably the joinable thread one.
> >>>
> >>>At present, the implementation of barriers seems to be buggy (I need
> >>>to dig back up the post about that), and they're also a really
> >>>expensive synchronization tool that goes both directions where we
> >>>really only need one direction (notifying the caller we're done
> >>>consuming the args). I'd rather switch to a semaphore, which is the
> >>>lightest and most idiomatic (at least per present-day musl idioms) way
> >>>to do this.
> >>>
> >>This sounds good to me. The same approach can also be used in
> >>timer_create (assuming it's acceptable to add dependency on
> >>pthread_cancel to that code).
> >>
> >>>Using a joinable thread also lets us ensure we don't leave around
> >>>threads that are waiting to be scheduled just to exit on failure
> >>>return. Depending on scheduling attributes, this probably could be
> >>>bad.
> >>>
> >>I also prefer this approach, though mostly for aesthetic reasons (I
> >>haven't thought about the scheduling behavior). I didn't use it only
> >>because I felt it's a "logically larger" change than simply moving
> >>the pthread_barrier_wait call. And I wasn't aware that barriers are
> >>buggy in musl.
> >
> >Finally following up on this. How do the attached commits look?
> >
> The first and third patches add calls to sem_wait, pthread_join, and
> pthread_detach, which are cancellation points in musl, so
> cancellation needs to be disabled across those calls. I mentioned
> that in my initial mail.
> 
> Also, I wasn't sure if it's fine to just remove
> pthread_attr_setdetachstate call, and I found the following in
> POSIX[1]:
> 
> "The function shall be executed in an environment as if it were the
> start_routine for a newly created thread with thread attributes
> specified by sigev_notify_attributes. If sigev_notify_attributes is
> NULL, the behavior shall be as if the thread were created with the
> detachstate attribute set to PTHREAD_CREATE_DETACHED. Supplying an
> attributes structure with a detachstate attribute of
> PTHREAD_CREATE_JOINABLE results in undefined behavior."
> 
> This language seems to forbid calling sigev_notify_function in the
> context of a joinable thread. And even if musl wants to ignore this,
> PTHREAD_CREATE_JOINABLE must still be set manually if
> sigev_notify_attributes is not NULL.
> 
> Otherwise, the patches look good to me.
> 
> [1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_02

Updated patch series attached.

View attachment "0001-fix-pthred_detach-inadvertently-acting-as-cancellati.patch" of type "text/plain" (1332 bytes)

View attachment "0002-mq_notify-use-semaphore-instead-of-barrier-to-sync-a.patch" of type "text/plain" (2163 bytes)

View attachment "0003-mq_notify-fix-use-after-close-double-close-bug-in-er.patch" of type "text/plain" (1597 bytes)

View attachment "0004-mq_notify-join-worker-thread-before-returning-in-err.patch" of type "text/plain" (1501 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.