Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 24 Mar 2020 09:53:12 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [Bug] Do not ignore membarrier return code

On Tue, Mar 24, 2020 at 02:20:08PM +0100, Julio Guerra wrote:
> Hello Rich,
> 
> Here are more details on what we did to reproduce the issue.
> You can clone this gist
> https://gist.github.com/vdeturckheim/d420310e272f525824d7e92e7e875024
> and have a look at the run.sh file example in order to get started
> with it. The test.js file does a require of the js bindings of grpc,
> which involves the dlopen.
> 
> What we observed yesterday with this example was:
> - It crashed approximately 9 times out of 10 on aws codebuild with the
> machine BUILD_GENERAL1_SMALL (3 GB memory, 2 vCPUs).
> - It worked all the time by only adding membarrier to the seccomp
> profile of the docker run.
> 
> But I wanted to give you more details with stack traces of the
> segfault by retrying today with gdb but I cannot reproduce it
> anymore...!
> I'll retry later to see if I see the error again...
> 
> If what you say about membarrier is true, I think there may be some
> synchronization side-effect of the syscall since, afaik, node uses
> threads in order to load the shared libraries in the libuv.

My best guess, especially since the crash was unpredictable, is that
the stack size on at least one thread is barely sufficient for what
it's doing. The fallback path when the membarrier syscall fails
requires the ability to deliver signals, and if any thread has
insufficient space left on its stack to accept a signal, it will
crash.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.