Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 11 Feb 2023 19:32:01 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] mq_notify: fix close/recv race on failure path

On Sat, Feb 11, 2023 at 11:14:33PM +0300, Alexey Izbyshev wrote:
> On 2023-02-11 22:49, Rich Felker wrote:
> >On Sat, Feb 11, 2023 at 10:28:20PM +0300, Alexey Izbyshev wrote:
> >>On 2023-02-11 21:35, Rich Felker wrote:
> >>>On Sat, Feb 11, 2023 at 09:08:53PM +0300, Alexey Izbyshev wrote:
> >>>>On 2023-02-11 20:59, Rich Felker wrote:
> >>>>>On Sat, Feb 11, 2023 at 08:50:15PM +0300, Alexey Izbyshev wrote:
> >>>>>>On 2023-02-11 20:13, Markus Wichmann wrote:
> >>>>>>>On Sat, Feb 11, 2023 at 10:06:03AM -0500, Rich Felker wrote:
> >>>>>>>>--- a/src/thread/pthread_detach.c
> >>>>>>>>+++ b/src/thread/pthread_detach.c
> >>>>>>>>@@ -5,8 +5,12 @@ static int __pthread_detach(pthread_t t)
> >>>>>>>> {
> >>>>>>>> 	/* If the cas fails, detach state is either already-detached
> >>>>>>>> 	 * or exiting/exited, and pthread_join will trap or cleanup. */
> >>>>>>>>-	if (a_cas(&t->detach_state, DT_JOINABLE, DT_DETACHED) !=
> >>>>>>>>DT_JOINABLE)
> >>>>>>>>+	if (a_cas(&t->detach_state, DT_JOINABLE, DT_DETACHED) !=
> >>>>>>>>DT_JOINABLE) {
> >>>>>>>>+		int cs;
> >>>>>>>>+		__pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &cs);
> >>>>>>>> 		return __pthread_join(t, 0);
> >>>>>>>                ^^^^^^ I think you forgot to rework this.
> >>>>>>>>+		__pthread_setcancelstate(cs, 0);
> >>>>>>>>+	}
> >>>>>>>> 	return 0;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>
> >>>>>>>I see no other obvious missteps, though.
> >>>>>>>
> >>>>>>Same here, apart from this and misspelled "pthred_detach" in the
> >>>>>>commit message, the patches look good to me.
> >>>>>>
> >>>>>>Regarding the POSIX requirement to run sigev_notify_function in the
> >>>>>>context of a detached thread, while it's possible to observe the
> >>>>>>wrong detachstate for a short while via pthread_getattr_np after
> >>>>>>these patches, I'm not sure there is a standard way to do that. Even
> >>>>>>if it exists, this minor issue may be not worth caring about.
> >>>>>
> >>>>>Would this just be if the notification callback executes before
> >>>>>mq_notify returns in the parent?
> >>>>
> >>>>Yes, it seems so.
> >>>>
> >>>>>I suppose we could have the newly
> >>>>>created thread do the work of making the syscall, handling the error
> >>>>>case, detaching itself on success and and reporting back to the
> >>>>>mq_notify function whether it succeeded or failed via the
> >>>>>semaphore/args structure. Thoughts on that?
> >>>>>
> >>>>Could we just move pthread_detach call to the worker thread to the
> >>>>point after pthread_cleanup_pop?
> >>>
> >>>I thought that sounded dubious, in that it might lead to an attempt to
> >>>join a detached thread, but maybe it's safe to assume recv will never
> >>>return if the mq_notify syscall failed...?
> >>>
> >>Actually, because app signals are not blocked when the worker thread
> >>is created, recv can indeed return early with EINTR. But this looks
> >>like just a bug.
> >
> >Yes. While it's not a conformance bug to run with signals unblocked
> >("The signal mask of this thread is implementation-defined.") it's a
> >functional bug to ever introduce threads that don't block all
> >application signals, since these interfere with sigwait & other
> >application control of where signals are delivered. This is an
> >oversight. I'll make it mask all signals.
> >
> >>Otherwise, mq_notify already assumes that recv can't return before
> >>SYS_mq_notify (if it did, the syscall would try to register a closed
> >>fd). I haven't tried to prove it (e.g. maybe recv may need to
> >>allocate something before blocking and hence can fail with ENOMEM?),
> >>but if it's true, I don't see how a failed SYS_mq_notify could cause
> >>recv to return, so joining a detached thread should be impossible if
> >>we make pthread_detach follow recv.
> >
> >I'm thinking for now maybe we should just drop the joining on error,
> >and leave it starting out detached. While recv should not fail, it's
> >obviously possible to make it fail in a seccomp sandbox, and you don't
> >want that to turn into UB inside the implementation. If it does fail,
> >the thread should still exit, but we have no way to synchronize with
> >the mq_notify parent to decide whether it's being joined or not in
> >this case without extra sync machinery...
> >
> By dropping pthread_join we'd avoid introducing a new UB case if
> recv fails unexpectedly, but the existing case that I mentioned
> (SYS_mq_notify trying to register a closed fd) would remain. It
> seems to me that moving SYS_mq_notify into the worker thread as you
> suggested earlier is the cleanest option if we're worrying about
> recv.

OK, I've done and reworked this series in a way that I think addresses
all the problems.

Rich

View attachment "0001-fix-pthread_detach-inadvertently-acting-as-cancellat.patch" of type "text/plain" (1361 bytes)

View attachment "0002-mq_notify-use-semaphore-instead-of-barrier-to-sync-a.patch" of type "text/plain" (2163 bytes)

View attachment "0003-mq_notify-rework-to-fix-use-after-close-double-close.patch" of type "text/plain" (2744 bytes)

View attachment "0004-mq_notify-join-worker-thread-before-returning-in-err.patch" of type "text/plain" (1613 bytes)

View attachment "0005-mq_notify-block-all-application-signals-in-the-worke.patch" of type "text/plain" (1869 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.