musl - Re: Crash in kill(..., SIGHUP) when using SA

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZlhoTMKsjk79zT3w@voyager>
Date: Thu, 30 May 2024 13:51:40 +0200
From: Markus Wichmann <nullplan@....net>
To: musl@...ts.openwall.com
Cc: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
Subject: Re: Crash in kill(..., SIGHUP) when using SA_ONSTACK

Am Thu, May 30, 2024 at 12:17:59PM +0200 schrieb Pablo Correa Gomez:
> El mie, 29-05-2024 a las 09:15 -0400, Rich Felker escribió:
> > On Wed, May 29, 2024 at 02:04:25PM +0200, Pablo Correa Gomez wrote:
> > > Thread 1 "unix" received signal SIGSEGV, Segmentation fault.
> > > 0x00007ffff7fa96e8 in __syscall2 (a2=1, a1=17483, n=62) at
> > > ../arch/x86_64/syscall_arch.h:21
> (gdb) layout asm
>
>  0x7ffff7fa96f9 <kill+7>     movslq %esi,%rsi
>  0x7ffff7fa96fc <kill+10>    mov    $0x3e,%eax
>  0x7ffff7fa9701 <kill+15>    syscall
> >0x7ffff7fa9703 <kill+17>    mov    %rax,%rdi
> [...]
> Does this tell you anything?
>

It tells me that Rich's reasoning was correct. I'll explain further
down.

> > I'm not sure if the crashing code is running on the signal stack or
> > main stack, but here's a thought: is it possible the CI machines are
> > running on a cpu/kernel with some monster AVX512 or whatever
> > extension
> > enabled with register file that doesn't fit in MINSIGSTKSZ?
>
> That might be the case. Would explain why I could not reproduce in my
> 9-year old laptop I was running last month, but I can reproduce it now
> in a new machine with a 13th Gen Intel(R) Core(TM) i7-1360P
>

That is exactly what the program is doing, according to the link you
provided in the OP.

> > It's also possible that the kernel may have some weird behavior
> > deciding if the task is already "running on the alt stack" when the
> > alt stack is embedded in the normal stack like this. Just getting rid
> > of that might be worth trying. If so, whether the problem manifests
> > could be subject to timing of signal delivery (although I would not
> > expect that for synchronously generated signals like here).
> >

Thankfully, we needn't speculate, as Linux is open source. The function
get_sigframe() will determine if the thread is currently executing on
the signal stack. It does that by determining that the sp is between
stack base and stack top. If that isn't the case, it will allocate a red
zone, else it will start at the top of the altstack. It will then try to
allocate a full frame. If that doesn't work (because it already was on
an altstack that got overflowed, or it tried to enter too small of an
altstack), then it will generate a message "overflowed sigaltstack",
that you might find in dmesg, before returning a bogus address.

Due to the bogus address, all calls to unsafe_put_user() in
x64_setup_rt_frame() will fail, and it will return EFAULT. This error
will be reported to signal_setup_done() and it will call
force_sigsegv(), which then reports a SIGSEGV at the "current" IP. Since
this all happens during a syscall, the current IP is the one directly
following the syscall instruction.

> > Rich
>

Ciao,
Markus
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.