Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 11 Sep 2019 09:52:00 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: 答复: Subject: [PATCH] pthread: Fix bug that
 pthread_create may cause priority inversion

On Wed, Sep 11, 2019 at 01:38:38PM +0000, zhaohang (F) wrote:
> Thank you Rich for your patch. It helps me a lot.
> 
> But I find that 'return 0' is used to let child thread exit. In that
> case, a bad thing will happen that the return address of child
> thread maybe undefined, if caller set prio of child unsuccessfully.

The code in __clone is supposed to perform SYS_exit if the start
function returns; this actually matters for users of the public
clone() function, I think.

> For example, In my system of arm, PC is set artificially to force
> child thread to begin with "start" function, but LR(the return
> address if call 'return 0') of child thread is undefined, so if
> something wrong happens when set prio, my system will crash.

At one point this was broken for at least one arch (mips, I think?) so
maybe it's broken for arm too. I'll check.

> Maybe __syscall(SYS_exit) is a better idea?

If I can't confirm that the code in __clone is correct for all archs,
I'll make it explicitly do SYS_exit for now, and revisit after
release, since I don't want to risk introducing a nasty regression
like this. Thanks for catching it!

Rich

> -----邮件原件-----
> 发件人: Rich Felker [mailto:dalias@...ifal.cx] 代表 Rich Felker
> 发送时间: 2019年9月10日 1:50
> 收件人: musl@...ts.openwall.com
> 主题: Re: [musl] Subject: [PATCH] pthread: Fix bug that pthread_create may cause priority inversion
> 
> On Mon, Sep 09, 2019 at 04:54:29PM +0200, Szabolcs Nagy wrote:
> > * zhaohang (F) <zhaohang14@...wei.com> [2019-09-09 13:57:36 +0000]:
> > > diff --git a/src/thread/pthread_create.c 
> > > b/src/thread/pthread_create.c index 7d4dc2e..ae08c0f 100644
> > > --- a/src/thread/pthread_create.c
> > > +++ b/src/thread/pthread_create.c
> > > @@ -181,15 +181,8 @@ static int start(void *p)  {
> > >         struct start_args *args = p;
> > >         if (args->attr) {
> > > -               pthread_t self = __pthread_self();
> > > -               int ret = -__syscall(SYS_sched_setscheduler, self->tid,
> > > -                       args->attr->_a_policy, &args->attr->_a_prio);
> > > -               if (a_swap(args->perr, ret)==-2)
> > > -                       __wake(args->perr, 1, 1);
> > > -               if (ret) {
> > > -                       self->detach_state = DT_DETACHED;
> > > -                       __pthread_exit(0);
> > > -               }
> > > +               if (a_cas(args->perr, -1, -2) == -1)
> > > +                       __wait(args->perr, 0, -2, 1);
> > >         }
> > >         __syscall(SYS_rt_sigprocmask, SIG_SETMASK, &args->sig_mask, 0, _NSIG/8);
> > >         __pthread_exit(args->start_func(args->start_arg));
> > > @@ -367,10 +360,14 @@ int __pthread_create(pthread_t *restrict res, const pthread_attr_t *restrict att
> > >         }
> > > 
> > >         if (attr._a_sched) {
> > > -               if (a_cas(&err, -1, -2)==-1)
> > > -                       __wait(&err, 0, -2, 1);
> > > -               ret = err;
> > > -               if (ret) return ret;
> > > +               ret = -__syscall(SYS_sched_setscheduler, new->tid, attr._a_policy, &attr._a_prio);
> > > +               if (ret) {
> > > +                       new->detach_state = DT_DETACHED;
> > > +                       pthread_cancel(new);
> > > +                       return ret;
> > 
> > the child has the cancel signal blocked so it will never act on the signal.
> 
> Also, pthread_create should not pull in cancellation. Aside from being unnecessary amounts of code that increases lots of costs in static linking (for example, cancellable syscall paths have to be used), there's no reason to use cancellation for something like this where it's not trying to work with arbitrary application code, just a fixed piece of code that admits explicit negotiation of how to continue.
> 
> > but even if that's fixed, the detached child may not get scheduled to 
> > handle the signal for a long time and will take up stack/tid resources.
> 
> That's the side issue I noted which my third patch fixes.
> 
> > i think Rich already has a solution that will deal with these issues.
> 
> Yes, sorry for not posting it sooner. Attached are the drafts that I plan to push soon. (If you see something wrong and they've already been pushed, just let me know and I'll fix it.) Patch 2 is the one that addresses the issue reported here.
> 
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.