musl - Re: [Toybox] Not sure how to debug this one.

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05774f03-57b5-f524-7a5b-c436237b5d4b@landley.net>
Date: Sat, 17 Feb 2024 07:32:00 -0600
From: Rob Landley <rob@...dley.net>
To: Mouse <mouse@...ents-Montreal.ORG>, toybox@...ts.landley.net,
 musl@...ts.openwall.com
Subject: Re: [Toybox] Not sure how to debug this one.

On 2/16/24 21:23, Mouse wrote:
>> While grinding away at release prep, I hit a WEIRD one.  The
>> qemu-system-sh4 target got broken [...by...] the commit that changed
>> the stdout buffering type.
> 
>> The actual _problem_ is that sigsetjmp() is faulting [...]
> [...]
>> While debugging I made the problem GO AWAY more than once by sticking
>> printfs() and similar into the code, [...]
> 
> This smells to me like depending on uninitialized stack trash.

A write-only function that didn't change its behavior when I memset the
structure before calling it?

Define "uninitialized". (Unless you mean an uninitialized variable inside a
function written entirely in assembly, that's part of a C library shipped and
used by many people for many years?)

>> Not siglongjmp, _sigsetjmp_.  Which means it's failing somewhere in:
>> 
>> https://git.musl-libc.org/cgit/musl/tree/src/signal/sh/sigsetjmp.s
> 
>> And I dunno how to stick a printf into superh assembly code.
> 
> The simple way to figure that out is to compile something that uses
> printf and look at the assembly,

$ ccc/sh4-linux-musl-cross/bin/sh4-linux-musl-objdump -d
generated/unstripped/toybox | grep -A 60 '<printf>:'
0045341c <printf>:
  45341c:	86 2f       	mov.l	r8,@-r15
  45341e:	f8 e2       	mov	#-8,r2
  453420:	22 4f       	sts.l	pr,@-r15
  453422:	a8 7f       	add	#-88,r15
  453424:	18 d0       	mov.l	453488 <printf+0x6c>,r0	! 45622c <memcpy>
  453426:	f3 61       	mov	r15,r1
  453428:	18 71       	add	#24,r1
  45342a:	29 21       	and	r2,r1
  45342c:	43 68       	mov	r4,r8
  45342e:	13 62       	mov	r1,r2
  453430:	18 72       	add	#24,r2
  453432:	7a 11       	mov.l	r7,@(40,r1)
  453434:	04 72       	add	#4,r2
  453436:	58 11       	mov.l	r5,@(32,r1)
  453438:	13 63       	mov	r1,r3
  45343a:	69 11       	mov.l	r6,@(36,r1)
  45343c:	f3 65       	mov	r15,r5
  45343e:	aa f2       	fmov	fr10,@r2
  453440:	44 75       	add	#68,r5
  453442:	bb f2       	fmov	fr11,@-r2
  453444:	20 73       	add	#32,r3
  453446:	13 62       	mov	r1,r2
  453448:	10 72       	add	#16,r2
  45344a:	04 72       	add	#4,r2
  45344c:	f3 64       	mov	r15,r4
  45344e:	8a f2       	fmov	fr8,@r2
  453450:	14 e6       	mov	#20,r6
  453452:	9b f2       	fmov	fr9,@-r2
  453454:	13 62       	mov	r1,r2
  453456:	08 72       	add	#8,r2
  453458:	04 72       	add	#4,r2
  45345a:	6a f2       	fmov	fr6,@r2
  45345c:	04 71       	add	#4,r1
  45345e:	7b f2       	fmov	fr7,@-r2
  453460:	4a f1       	fmov	fr4,@r1
  453462:	5b f1       	fmov	fr5,@-r1
  453464:	12 15       	mov.l	r1,@(8,r5)
  453466:	2c 71       	add	#44,r1
  453468:	11 15       	mov.l	r1,@(4,r5)
  45346a:	60 e1       	mov	#96,r1
  45346c:	fc 31       	add	r15,r1
  45346e:	33 15       	mov.l	r3,@(12,r5)
  453470:	32 25       	mov.l	r3,@r5
  453472:	0b 40       	jsr	@r0
  453474:	14 15       	mov.l	r1,@(16,r5)
  453476:	05 d0       	mov.l	45348c <printf+0x70>,r0	! 45500c <vfprintf>
  453478:	05 d4       	mov.l	453490 <printf+0x74>,r4	! 4cc9ec <__stdout_FILE>
  45347a:	0b 40       	jsr	@r0
  45347c:	83 65       	mov	r8,r5
  45347e:	58 7f       	add	#88,r15
  453480:	26 4f       	lds.l	@r15+,pr
  453482:	0b 00       	rts	
  453484:	f6 68       	mov.l	@r15+,r8
  453486:	09 00       	nop	
  453488:	2c 62       	extu.b	r2,r2
  45348a:	45 00       	mov.w	r4,@(r0,r0)
  45348c:	0c 50       	mov.l	@(48,r0),r0
  45348e:	45 00       	mov.w	r4,@(r0,r0)
  453490:	ec c9       	and	#-20,r0
  453492:	4c 00       	mov.b	@(r0,r4),r0

That's just the vfprintf() wrapper, which has the actual plumbing for escape
parsing and such, and is of course running its output through the ascii FILE *
infrastructure.

No, I would wind up CALLING the function, meaning set up a call stack, but how
you're supposed to do that in the middle of setjmp() without corrupting the
registers you're supposed to be saving... even manually making a _system_call_
in that context is... I mean it's _documented_ in
https://man7.org/linux/man-pages/man2/syscall.2.html:

Arch/ABI    Instruction           System  Ret  Ret  Error    Notes
                                  call #  val  val2
───────────────────────────────────────────────────────────────────
superh      trapa #31             r3      r0   r1   -        4, 6

But again, the point is to SAVE those registers, in a defined order, and there's
no WAY to insert something that big into delicate assembly non-intrusively. This
already heisenbugs if my dprintf() is too elaborate.

> either by using -save-temps or
> equivalent or by disassembling the binary.

Did I mention I once stuck print-to-stderr debugging into the uclibc dynamic
loader while doing system bringup on the hexagon architecture? Which couldn't
use any global variables, function calls, or string constants because it hadn't
relocated itself yet so I assembled a message into a char buffer[] on the stack
and did a syscall(_nr_write).

Similar to debugging uboot before it relocated itself from NOR flash to sram
(and thus all the locations the linker had provided for symbols outside the
current function and stack were wrong), where debug output was a loop that wrote
a byte at a time to the serial port spinning checking the ready-for-next-byte
status bit. In that case I worked out the constants I needed to subtract from
"string constants" (because a string constant resolves to a pointer of type char
so you can "hello"-0x40800300 and that's a byte offset).

In theory the same technique would apply to function pointers (every function
name is a pointer) but the TYPE of said pointer is sizeof(function) and doing
math on them isn't really a thing, so you need to typecast to char and then BACK
again (and the syntax for function pointer typecasting has too many parentheses
in non-obvious locations, I generally find it easier to declare a function
pointer variable and then (void *) typecast assign to that), but that doesn't
help if the function then tries to call ANOTHER function, as so many of them do,
so... didn't turn out to be very useful.

But that's not what I was asking about HERE. "Magic blob of assembly for
architecture I'm not hugely familiar with is throwing an interrupt, I wonder why?"

> But, given what sigsetjmp is, sticking a printf in there is likely to
> be more difficult than usual.

Define "usual".

Oh, I forgot to mention that qemu-system-blah also has a -s option to launch a
gdbserver on a port. (Which as with all the classic qemu options is now
described in the --help text as "-s    shorthand for -gdb tcp::1234" which is
just sad. And that's _after_ they renamed it from
https://landley.net/notes-2008.html#19-03-2008 when it was apparently -g ?)

I believe qemu -s is emulating a jtag, kgdb is SORT of emulating a jtag, and
then normal gdbserver is providing userspace context debugging. Same protocol,
what differs is what the registers mean and symbol visibility/namespace context.
This is why having an unstripped "vmlinux" is so useful: it's an ELF kernel with
all the symbols so gdb can load it and give you kernel namespace context. Even
if what you actually RAN is one of the repackaged versions, the linking's
already been done so the memory layout's fixed.

Except on sparc, with RELOCATES ITSELF. No, I don't know why either, but I broke
it back under aboriginal and had to get help debugging it:

https://lkml.org/lkml/2011/11/12/57

I do not always have the relevant domain expertise, which is why I try to ask
people who _do_:

https://lkml.org/lkml/2011/12/14/324

(One of the big goals of aboriginal linux and now mkroot is the ability to
package up a test case that somebody can reproduce on their machine without
needing specific hardware, INCLUDING a portable build environment that lets them
rebuild the provided binaries. Hence self-contained qemu-system builds built
with provided portable toolchains that plug into a a build that's both "do this,
here's the output" AND "you don't have to use my wrapper, it should be obvious
what it does".)

> I know a little about Super-H from some Dreamcast hackery I did a while
> back.  I had a look at the .s file you cite - thank you, musl-libc.org,
> for resisting the stampede to try to ram HTTPS down everyone's
> throat[%]! - and, while I can read it, there is too much I don't know
> to really claim to understand it.  I can convert the assembly into
> English, certainly, but I don't know how much that would help
> (especially since it's the machine language, not assembly language, I
> know; the SH assembler I've used is my own, with its own syntax, so I'm
> having to guess at the meaning of some parts).

One of the private email replies that didn't go to the list (so I can't politely
publicly reply to it and maybe get more people who know stuff chiming in)
suggested trying it under qemu-user (which reproduced the issue! MUCH easier),
and provided better debug output: I got a register dump (with a program counter
I can probably dig through the sh4-linux-musl-objdump -d
generated/unstripped/toybox (or readelf -a) to identify the failing instruction):

Unhandled trap: 0x180
pc=0x3fffe6b0 sr=0x00000001 pr=0x00427c40 fpscr=0x00080000
spc=0x00000000 ssr=0x00000000 gbr=0x004cd9e0 vbr=0x00000000
sgr=0x00000000 dbr=0x00000000 delayed_pc=0x00451644 fpul=0x00000000
r0=0x3fffe6b0 r1=0x00000000 r2=0x00000000 r3=0x000000af
r4=0x00000002 r5=0x00481afc r6=0x407fffd0 r7=0x00000008
r8=0x3fffe6b0 r9=0x00456bb0 r10=0x004cea74 r11=0x3fffe6b0
r12=0x3fffe510 r13=0x00000000 r14=0x00456fd0 r15=0x407ffe88
r16=0x00000000 r17=0x00000000 r18=0x00000000 r19=0x00000000
r20=0x00000000 r21=0x00000000 r22=0x00000000 r23=0x00000000

And that ALSO says it's a trap 0x180 which in qemu:

sh7750_regs.h:#define SH7750_EVT_ILLEGAL_INSTR       0x180 /* General
Illegal Instruction */

I boggle. (I also tried backing up in qemu to see where it's generated from, but
alas this is MODERN qemu: the macro defined there is never used in the code, and
the fprintf() is the return code from a function that wraps a function pointer
call for a variable that is never assigned to in the sh architecture, so
probably initialized by a macro I can't grep for. Digging is ongoing.

He also pointed me at https://sourceware.org/bugzilla/show_bug.cgi?id=27543
which is interesting, but neither sigsetjmp.s nor the setjmp.S it calls have
those two floating point instructions. (Although it saves floating point
registers by number so... is this a synonym for the same thing? Floating point
flags in weird state throwing an exception that's showing up as illegal
instruction but is actually closer to a division by zero error or overflow or
something? Touched floating point register before setting FPU mode? Dunno. Hmmm,
is any of the code between the start of the function and the failure point doing
floating point math? There isn't any in toysh, but I can't guaratantee libc
functions like sprintf() don't use some, and somehow leave the FPU in a weird
state that faults trying to dump its registers? I'm guessing here...)

> [%] Having HTTP support meant I could just look at the http: version
>     instead of needing to wait until I could use a work machine.
> 
>> (The problem with trying to configure the kernel to produce core
>> dumps and compare against the readelf -d output is it's running as
>> PID 1.  [...])
> 
> Why is that a problem?  I don't see any statement of what kernel you're
> running under, but I can think of two plausible reasons offhand: (1)
> the kernel refuses to coredump PID 1 under any circumstances or (2)
> there's no writable filesytem to take a coredump on at that point.

The kernel panics immediately upon PID 1 exiting and even if the panic is
deferred until after it's written the core dump instead of a check at the START
of exiting, the writeable filesystem is initramfs which is transient.

Best case scenario would be _if_ the panic happens at the _end_ of exiting
(highly unlikely, but maybe patchable) setting up a network block device and
making it O_DIRECT somehow so the data goes out before the exit without being
delayed by disk cache or nagle or kernel tasklets being asynchronous or anything.

Once upon a time (like 2.0 or something) the kernel continued processing network
packets and such after panic, so setting up your firewall rules and then
intentionally panicing the kernel was considered the most secure way to set up a
Linux router. (Try exploiting a system with NO USERSPACE.) But alas, the kernel
got "improved" so that no longer works. (The theory was freeze file IO _now_
because we dunno what's corrupted, so flushing caches to disk and/or network
filesystems may make things worse, so STOP EVERYTHING and preserve as much
forensic evidence as possible in case of kernel crash dumps or kgdb or kexec on
panic or similar. Needing to keep the device you're writing kernel crash dumps
to active was, of course, one of those truly funky sequencing issues the kernel
got subtly wrong for many years, but the plumbing rewrite that gave us sysfs and
years of working on suspend sequencing finally straightened out the dependencies
I think?)

> To address (1), I'd just build a kernel with that test diked out.
> 
> To address (2), I'd normally netboot.  It that's not feasible for some
> reason, I'd probably hack on the kernel to remount / read-write before
> starting userland.

A) I believe you can still pass rw on the kernel command line, B) you can run a
dumb little statically linked shim.c as rdinit= to do stuff and then have it
exec() the next PID 1 process, that's fairly standard procedure in this context.

Don't have to modify the kernel for either, but "file reliably written out as
kernel is in the process of panicing"... I already mentioned kgdb, right?
There's a way to get a serial console out of it so the kernel itself is acting
as your debugger:

https://www.kernel.org/doc/html/v4.14/dev-tools/kgdb.html

There's some sort of unholy sacrifice a chicken in the summoning circle layering
violations going on when this happens, but yes you can panic to a kgdb console.
I've done it! Not recently though. (I suspending linux with kgdb and then
resuming still RCU timeout city on a modern kernel? Or did they fix that?)

But I only pull out gdb when I'm REALLY annoyed. (Cure worse than the disease.
Can't STAND the user interface...)

> Of course, you said qemu-something, so you are presumably running under
> emulation.  In principle, you could figure this out from emulator
> traces, but that is likely to be both extremely difficult and extremely
> tedious.
> 
> But - you said memset-to-zero on the struct ran but didn't stop it from
> failing.  I'd try memset to various other values, to see if you can
> find one that makes it stop crashing.

Except sigsetjmp() is writing to the structure. The function is not supposed to
be reading from the structure. The memset() was to dirty the memory so I could
be sure there wasn't some sort of -EACCESS or a soft fault from stack growth
somehow(?) causing a hiccup.

Rob
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.