musl - Re: [Toybox] Not sure how to debug this one.

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <202402170323.WAA04412@Stone.Rodents-Montreal.ORG>
Date: Fri, 16 Feb 2024 22:23:04 -0500 (EST)
From: Mouse <mouse@...ents-Montreal.ORG>
To: toybox@...ts.landley.net, musl@...ts.openwall.com
Subject: Re: [Toybox] Not sure how to debug this one.

> While grinding away at release prep, I hit a WEIRD one.  The
> qemu-system-sh4 target got broken [...by...] the commit that changed
> the stdout buffering type.

> The actual _problem_ is that sigsetjmp() is faulting [...]
[...]
> While debugging I made the problem GO AWAY more than once by sticking
> printfs() and similar into the code, [...]

This smells to me like depending on uninitialized stack trash.

> Not siglongjmp, _sigsetjmp_.  Which means it's failing somewhere in:
> 
> https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s

> And I dunno how to stick a printf into superh assembly code.

The simple way to figure that out is to compile something that uses
printf and look at the assembly, either by using -save-temps or
equivalent or by disassembling the binary.

But, given what sigsetjmp is, sticking a printf in there is likely to
be more difficult than usual.

I know a little about Super-H from some Dreamcast hackery I did a while
back.  I had a look at the .s file you cite - thank you, musl-libc.org,
for resisting the stampede to try to ram HTTPS down everyone's
throat[%]! - and, while I can read it, there is too much I don't know
to really claim to understand it.  I can convert the assembly into
English, certainly, but I don't know how much that would help
(especially since it's the machine language, not assembly language, I
know; the SH assembler I've used is my own, with its own syntax, so I'm
having to guess at the meaning of some parts).

[%] Having HTTP support meant I could just look at the http: version
    instead of needing to wait until I could use a work machine.

> (The problem with trying to configure the kernel to produce core
> dumps and compare against the readelf -d output is it's running as
> PID 1.  [...])

Why is that a problem?  I don't see any statement of what kernel you're
running under, but I can think of two plausible reasons offhand: (1)
the kernel refuses to coredump PID 1 under any circumstances or (2)
there's no writable filesytem to take a coredump on at that point.

To address (1), I'd just build a kernel with that test diked out.

To address (2), I'd normally netboot.  It that's not feasible for some
reason, I'd probably hack on the kernel to remount / read-write before
starting userland.

Of course, you said qemu-something, so you are presumably running under
emulation.  In principle, you could figure this out from emulator
traces, but that is likely to be both extremely difficult and extremely
tedious.

But - you said memset-to-zero on the struct ran but didn't stop it from
failing.  I'd try memset to various other values, to see if you can
find one that makes it stop crashing.  If so, maybe the do two runs,
one with it memset to one value and one with it set to another, take
instruction traces, and see where they differ...?

> It would be really nice if somebody who understood the assembly could
> spot something...

Well, as I said, I can read it, mostly, but I don't know enough of the
context to know whether it's right or not.

/~\ The ASCII				  Mouse
\ / Ribbon Campaign
 X  Against HTML		mouse@...ents-montreal.org
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.