musl - Re: aio_cancel segmentation fault for in progress write requests

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181207235040.GK23599@brightrain.aerifal.cx>
Date: Fri, 7 Dec 2018 18:50:40 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: aio_cancel segmentation fault for in progress write
 requests

On Fri, Dec 07, 2018 at 04:51:03PM -0600, A. Wilcox wrote:
> On 12/07/18 14:35, Markus Wichmann wrote:
> > On Fri, Dec 07, 2018 at 01:13:44PM -0600, A. Wilcox wrote:
> >> So, my best theory is that running inside a debugger (gdb, valgrind)
> >> makes it slow enough that it no longer races.
> > 
> > Two ideas to investigate further. 1: Produce a coredump ("ulimit -c
> > unlimited"). That won't interfere with timing, but I have no clue if
> > coredumps work with multithreading.
> 
> Core was generated by `./aioWrite '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  __cp_end () at src/thread/powerpc64/syscall_cp.s:32
> 32      src/thread/powerpc64/syscall_cp.s: No such file or directory.
> [Current thread is 1 (LWP 5507)]
> (gdb) bt
> #0  __cp_end () at src/thread/powerpc64/syscall_cp.s:32
> #1  0x00003fffa768f2a4 in __syscall_cp_c (nr=180, u=512512, v=0, w=0,
> x=0, y=0, z=0) at src/thread/pthread_cancel.c:35
> #2  0x00003fffa768e008 in __syscall_cp (nr=<optimized out>, u=<optimized
> out>, v=<optimized out>, w=<optimized out>, x=<optimized out>,
> y=<optimized out>, z=<optimized out>) at src/thread/__syscall_cp.c:20
> #3  0x00003fffa76969f4 in pwrite (fd=<optimized out>, buf=<optimized
> out>, size=<optimized out>, ofs=<optimized out>) at src/unistd/pwrite.c:7
> #4  0x00003fffa763eddc in io_thread_func (ctx=<optimized out>) at
> src/aio/aio.c:240
> #5  0x00003fffa768f76c in start (p=0x3fffa76e8af8) at
> src/thread/pthread_create.c:147
> #6  0x00003fffa769b608 in __clone () at src/thread/powerpc64/clone.s:43
> (gdb) thread 2
> [Switching to thread 2 (LWP 5506)]
> #0  0x00003fffa7637144 in __syscall4 (d=0, c=-1, b=128, a=512, n=221) at
> ./arch/powerpc64/syscall_arch.h:54
> 54      ./arch/powerpc64/syscall_arch.h: No such file or directory.
> (gdb) bt
> #0  0x00003fffa7637144 in __syscall4 (d=0, c=-1, b=128, a=512, n=221) at
> ./arch/powerpc64/syscall_arch.h:54
> #1  __wait (addr=0x200, waiters=0x0, val=<optimized out>,
> priv=<optimized out>) at src/thread/__wait.c:13
> #2  0x00003fffa763f07c in aio_cancel (fd=<optimized out>,
> cb=0x3fffffafd2b8) at src/aio/aio.c:356
> #3  0x000000012034c044 in main ()
> 
> 
> 221 is SYS_futex.  Wow, that looks wrong.

I don't think thread 2 (odd numbering; it looks like the main thread)
is relevant to the crash; it's alread proceeded past whatever was
happening when thread 1 (the io thread) started crashing.

I'm guessing it is stack overflow. Can you dump the registers (to see
the stack pointer value) and info about memory ranges? That should
show how much space is left on the stack at the point of crash. If the
crash is the signal handler trying to run, there will probably be some
space left but less than the size of a signal frame, and the kernel
will probably refrain from moving the stack pointer to include the
signal frame.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.