Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 26 Jun 2019 11:33:55 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: Re: seccomp causes pthread_join() to hang

On Wed, Jun 26, 2019 at 08:30:34AM +0100, Radostin Stoyanov wrote:
> On 26/06/2019 00:26, Rich Felker wrote:
> >On Wed, Jun 26, 2019 at 12:18:05AM +0100, Radostin Stoyanov wrote:
> >>Hello,
> >>
> >>In the test suite of CRIU [1] we have noticed an interesting bug
> >>which is caused by commit 8f11e6127fe93093f81a52b15bb1537edc3fc8af
> >>("track all live threads in an AS-safe, fully-consistent linked
> >>list") [2].
> >>
> >>When seccomp is used in a multithreaded application it may cause
> >>pthread_join() to hang.
> >>
> >>This is a minimal application to reproduce the issue:
> >>
> >>
> >>#include <errno.h>
> >>#include <seccomp.h>
> >>#include <stdio.h>
> >>#include <stdlib.h>
> >>#include <string.h>
> >>#include <pthread.h>
> >>#include <unistd.h>
> >>
> >>static void *fn()
> >>{
> >>     scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);
> >>     if (!ctx) {
> >>         perror("seccomp_init");
> >>         goto err;
> >>     }
> >>
> >>     if (seccomp_load(ctx) < 0) {
> >>         perror("seccomp_load");
> >>         goto err;
> >>     }
> >>
> >>     /* This should cause SIG_KILL */
> >>     getpid();
> >>err:
> >>     return (void *)1;
> >>}
> >>
> >>int main()
> >>{
> >>     pthread_t t1;
> >>
> >>     if (pthread_create(&t1, NULL, fn, NULL)) {
> >>         perror("pthread_create");
> >>         return -1;
> >>     }
> >>
> >>     if (pthread_join(t1, NULL)) {
> >>         perror("pthread_join");
> >>         return -1;
> >>     }
> >>
> >>     return 0;
> >>}
> >>
> >>
> >>Expected behaviour: Thread t1 should receive SIG_KILL and the main
> >>thread should return 0.
> >>Actual behaviour: pthread_join() hangs.
> >>Reproducibility: Always
> >>Regression: Yes
> >>
> >>
> >>This bug can be reproduced with Alpine 3.10 ($ docker run -it
> >>alpine:3.10 sh).
> >A fundamental property of the pthread API, and part of why threads are
> >a much better primitive than processes for lots of purposes, is that
> >threads are not killable; only whole processes are.
> From the man page of seccomp(2):
> 
>     SECCOMP_RET_KILL_PROCESS: This value results in immediate
> termination of the process, with a core dump. ...
> 
>     SECCOMP_RET_KILL_THREAD (or SECCOMP_RET_KILL): This  value
> results in immediate termination of the thread that made the system
> call. The system call is not executed. Other threads in the same
> thread group will continue to execute. ...

OK, that's really good to know, that they're separate so you can use
KILL_PROCESS safely.

> >  Any configuration
> >that results in a thread being terminated out from under the process
> >has all sorts of extremely dangerous conditions with memory/locks
> >being left in inconsistent state, tid reuse while the application
> >thinks the old thread is still alive, etc., and fundamentally can't be
> >supported. What you're seeing is exposure of a serious existing
> >problem with this seccomp usage, not a regression.
> I wrote "Regression: Yes" because this bug was recently introduced
> and it does not occur in previous versions.
> 
> IMHO causing pthread_join() to hang when a thread has been
> terminated is not expected behaviour, at least because the man page
> for pthread_join(3) states:
> 
>     The pthread_join() function waits for the thread specified by
> thread to terminate. If that thread has already terminated, then
> pthread_join() returns immediately.
> 
> and indeed prior commit 8f11e612 pthread_join() returns immediately.

...with the process in an unrecoverably broken state, just in ways you
don't notice. For example, any owner-tracked mutexes or FILEs it owned
when it died will be linked into a linked list whose head is in its
pthread structure, which was deallocted when you called pthread_join.

There are also various places where a lock is held on an individual
thread or the thread list to ensure that it doesn't exit (and its tid
isn't reused) until the lock is released. Killing it out from under
the program breaks this invariant and can cause signals to be sent to
wrong threads/processes or other malfunctions.

This simply is not, and fundamentally cannot be, supported usage.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.