Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 31 Mar 2023 13:40:09 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: Matt Wozniski <godlygeek@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: Unwinding multithreaded musl applications with elfutils
 fails

* Matt Wozniski <godlygeek@...il.com> [2023-03-30 22:43:28 -0400]:
> I'm unsure if this is an elfutils bug or a musl bug. I suspect both.
> I've already reported this to the elfutils maintainers at
> https://sourceware.org/bugzilla/show_bug.cgi?id=30272
> 
> Using the elfutils eu-stack program or libdw's dwfl_getthread_frames
> API to unwind multithreaded applications linked against musl libc on
> x86-64 fails, getting stuck on `__clone`:

musl has limited cfi debug info support (target specific), likely the
unwinder needs a

  .cfi_undefined rip

in the clone start function to know where the stack frames end.
(it could figure out the end with the same heuristic that gdb uses,
but apparently elfutils is not smart enough).

some backtracers may want cleared frame-pointer (rbp=0) to detect
the end. but musl does not guarantee frame-pointers either. rbp=0
may be the reason why backtrace in the main thread works, so it
may be enough to do that in threads too.

musl supports building things without any cfi debug info since c
does not require unwind support, but linux systems nowadays assume
unwind tables are part of the platform abi so musl based distros
should probably include it.


> 
> TID 241:
> <uninteresting frames snipped>
> #20 0x00007f6f2f74f08b start
> #21 0x00007f6f2f75138e __clone
> #22 0x00007f6f2f75138e __clone
> #23 0x00007f6f2f75138e __clone
> ...
> #253 0x00007f6f2f75138e __clone
> #254 0x00007f6f2f75138e __clone
> #255 0x00007f6f2f75138e __clone
> eu-stack: tid 241: shown max number of frames (256, use -n 0 for unlimited)
> 
> 
> GDB seems to detect the condition that libdw is getting stuck on,
> emitting a warning message but terminating:
> 
> <uninteresting frames snipped>
> #44 0x00007f8f83e4d08b in start (p=0x7f8f836b8b00) at
> src/thread/pthread_create.c:203
> #45 0x00007f8f83e4f38e in __clone () at src/thread/x86_64/clone.s:22
> Backtrace stopped: frame did not save the PC
> 
> I suspect the cause for gdb's "frame did not save the PC" warning and
> elfutils' repeated emission of the same frame is an invalid DWARF CIE
> for __clone in musl.
> 
> 
> Reproducer:
> 
> docker run -it --privileged python:3.10-alpine sh
> 
> And in the container:
> 
> apk add --update musl-dbg elfutils
> python3.10 -c "import os, threading; threading.Thread(target=lambda:
> os.system(f'eu-stack --pid={os.getpid()}')).start()"
> 
> That spawns a thread that forks a subprocess that runs `eu-stack` on
> its parent, and reproduces the issue. If you remove the thread and
> just run:
> 
> python3.10 -c "import os; os.system(f'eu-stack --pid={os.getpid()}')"
> 
> then unwinding succeeds, ending at `_start`.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.