musl - Re: Bug report: Reproduction of seg fault caused by musl thread creation race condition

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABkUXbe6-vg+A5--2jPJCn4QMJ1HTtt4bdL0S7DQv9nvotwj9A@mail.gmail.com>
Date: Mon, 27 Jan 2020 11:59:24 -0800
From: Simon <simonhf@...il.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: Bug report: Reproduction of seg fault caused by musl
 thread creation race condition

>
> Upon successful completion, pthread_create() shall store the ID
>

"Upon successfully completion" is somewhat vague wording IMO which could
result in misinterpretation? You seem to have interpreted it as "Upon
function completion", but surely if clone() succeeds, and the new thread
starts running, then there is already "successful completion" even before
pthread_create() has returned to caller?

It's also interesting that the latest man pages for pthread_create() [2]
have changed the wording substantially:

Before returning, a successful call to pthread_create() stores the ID of
> the new thread in the buffer pointed to by thread; this identifier is used
> to refer to the thread in subsequent calls to other pthreads functions.
>

This seems more specific than "Upon successful completion" but still
doesn't exactly tie down whether the thread ID should be written before the
new thread starts execution, only saying (somewhen) "Before returning" the
thread ID should be written by pthread_create(). In a way, it's backing up
the musl thread ID store implementation because it says 'subsequent calls
to other pthreads functions' which implies calls after pthread_create()
returns, whether in the same thread or other threads including the newly
created thread.

any code relying on it is non-portable/racy.
>

I checked the glibc pthread_create() implementation and it appears to set
the thread ID before the new thread executes. So from that POV any code
using glibc relying on this is arguably non-portable, but not racy since
there is no race with the glibc implementation?

Are there reasons you still think the alternate behavior would be
> preferable?
>

I'm not advocating for C/C++ code to access the thread ID like this.
However, assuming (a big if...) that long existing code relying on this --
that hasn't been compiled with musl yet -- exists and is not racy and works
fine with glibc, but will intermittently fail with the musl implementation,
wouldn't it be 'defensive programming' [3] to move the thread ID store in
the musl pthread_create() implementation? Worse, a binary compiled with
musl and this issue might already exist and be "working", but like the code
I provided, only fails if run by gdb or strace or specific timing
situations... and maybe gdb or strace have never been run on it to date? So
it's a sort of 'sleeper' issue?

Personally I think the proper fix is to (a) write the thread ID before the
new thread starts executing because it's common sense that things should
work that way, and existing code written with locks will continue to work,
and existing code written without locks will also just work, and (b)
whether fixing or not, try to update the various pthread_create()
documentation to account for this edge case, so that it's obvious one way
or the other. This would (a) fix the cause of the confusion, (b) avoid
users of pthread_create() having to fall back to common sense, and (c)
optionally make all existing caller implementations more robust?

If the new thread needs its own id, there's an easy and portable way to
> obtain it: pthread_self().
>

Interestingly, my initial implementation with pthread_self() actually
failed via glibc, or at least one of the pthread functions using it seemed
to fail... I cannot remember the exact circumstances unfortunately. Which
is why I started using the thread ID returned by pthread_create() in the
first place, because that didn't fail and no lock was needed. My personal
philosophy with threads and locks is to try and minimize their use
where-ever possible, and that's the path which prompted this email thread
in the first place because I developed the working code with glibc and only
later tried compiling and running with musl and to my surprise it didn't
work...

Anyway, I hope this is food for thought and no worries if you decide not to
change anything. It's been fun bringing this to your attention and
interacting with you! And if there's anything more I can do to help further
then please let me know!

--
Simon

[1] https://pubs.opengroup.org/onlinepubs/007908799/xsh/pthread_create.html
[2] http://man7.org/linux/man-pages/man3/pthread_create.3.html
[3] https://en.wikipedia.org/wiki/Defensive_programming

On Mon, Jan 27, 2020 at 9:51 AM Rich Felker <dalias@...c.org> wrote:

> On Sun, Jan 26, 2020 at 04:33:57PM -0800, Simon wrote:
> > Hello! I recently had some C code which works normally with glibc but seg
> > faults sometimes with musl. I managed to reproduce the seg fault via
> > musl-gcc and Alpine Linux and document it here [1]. Seems to be some kind
> > of race condition, so hopefully you guys also get a seg fault when you
> > follow my reproduction steps. Hope this helps and looking forward to any
> > feedback or helping further if possible, Simon
> >
> > [1] https://gist.github.com/simonhf/6d8097f4d6caa572cc42354f494b20ef
>
> This behavior was originally intentional. In general, if a function is
> specified to modify pointed-to memory as output as a side effect of
> success, that does not give it license to modify it on failure. And
> since pthread_create can't commit to success until after the thread is
> created, it would have to hold back start of the new thread
> unnecessarily to guarantee that the result is written before the new
> thread starts. (Note that it can't simply write the value from both
> the caller and the new thread; the latter could end up writing to the
> pthread_t object after the end of its lifetime.)
>
> Moreover, there is no expectation from the application that it should
> be able to read the result object from the new thread without
> additional synchronization. The wording of the spec is:
>
>     "Upon successful completion, pthread_create() shall store the ID
>      ^^^^^^^^^^^^^^^^^^^^^^^^^^
>     of the created thread in the location referenced by thread."
>
> Until completion of (observation of & synchronization with return
> from) pthread_create, nothing can be said about value of the object;
> access to it is unsynchronized.
>
> With that said, the specification for pthread_create does *allow*
> implementations that store the value speculatively before success:
>
>     "If pthread_create() fails, no new thread is created and the
>     contents of the location referenced by thread are undefined."
>
> I was not aware of this when writing it. So we could change it, but it
> doesn't seem like a very good idea to do so; any code relying on it is
> non-portable/racy. If the new thread needs its own id, there's an easy
> and portable way to obtain it: pthread_self().
>
> Are there reasons you still think the alternate behavior would be
> preferable?
>
> Rich
>

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.