Date: Mon, 10 Dec 2018 10:05:05 +0100 From: Arkadiusz Sienkiewicz <sienkiewiczarkadiusz@...il.com> To: musl@...ts.openwall.com Cc: dalias@...c.org Subject: Re: aio_cancel segmentation fault for in progress write requests Here are answers to some question directed to me earlier: > Could you attach the log from "strace -f -o strace.log ~/aioWrite"? Sorry, can't do that. strace is not installed and I don't have root access. If this is still needed I will ask admin to add strace. > Do the other machines have the same kernel (4.15.0-20-generic)? No, other machines use kernel 4.15.0-39-generic. > Have you tried running the binary built on a successful machine on the problematic machine? Yes, same effect - segmentation fault. bt from gdb is identical too. > valgrind might also be a good idea. alpine-tmp-0:~$ strace -f ./aioWrite -sh: strace: not found alpine-tmp-0:~$ valgrind valgrind valgrind-di-server valgrind-listener alpine-tmp-0:~$ valgrind ./aioWrite ==70339== Memcheck, a memory error detector ==70339== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==70339== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==70339== Command: ./aioWrite ==70339== ==70339== Invalid free() / delete / delete / realloc() ==70339== at 0x4C92B0E: free (vg_replace_malloc.c:530) ==70339== by 0x4020248: reclaim_gaps (dynlink.c:478) ==70339== by 0x4020CD0: map_library (dynlink.c:674) ==70339== by 0x4021818: load_library (dynlink.c:980) ==70339== by 0x4022607: load_preload (dynlink.c:1075) ==70339== by 0x4022607: __dls3 (dynlink.c:1585) ==70339== by 0x4021EDB: __dls2 (dynlink.c:1389) ==70339== by 0x401FC8E: ??? (in /lib/ld-musl-x86_64.so.1) ==70339== Address 0x4e9a180 is in a rw- mapped file /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so segment ==70339== ==70339== Can't extend stack to 0x4087948 during signal delivery for thread 2: ==70339== no stack segment ==70339== ==70339== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==70339== Access not within mapped region at address 0x4087948 ==70339== at 0x4016834: __syscall3 (syscall_arch.h:29) ==70339== by 0x4016834: __wake (pthread_impl.h:133) ==70339== by 0x4016834: cleanup (aio.c:154) ==70339== by 0x40167B0: io_thread_func (aio.c:255) ==70339== by 0x4054292: start (pthread_create.c:145) ==70339== by 0x4053071: ??? (clone.s:21) ==70339== by 0x4053071: ??? (clone.s:21) ==70339== by 0x4053071: ??? (clone.s:21) ==70339== by 0x4053071: ??? (clone.s:21) ==70339== by 0x4053071: ??? (clone.s:21) ==70339== by 0x4053071: ??? (clone.s:21) ==70339== by 0x4053071: ??? (clone.s:21) ==70339== by 0x4053071: ??? (clone.s:21) ==70339== by 0x4053071: ??? (clone.s:21) ==70339== If you believe this happened as a result of a stack ==70339== overflow in your program's main thread (unlikely but ==70339== possible), you can try to increase the size of the ==70339== main thread stack using the --main-stacksize= flag. ==70339== The main thread stack size used in this run was 8388608. ==70339== ==70339== HEAP SUMMARY: ==70339== in use at exit: 81,051 bytes in 9 blocks ==70339== total heap usage: 9 allocs, 3 frees, 81,051 bytes allocated ==70339== ==70339== LEAK SUMMARY: ==70339== definitely lost: 0 bytes in 0 blocks ==70339== indirectly lost: 0 bytes in 0 blocks ==70339== possibly lost: 0 bytes in 0 blocks ==70339== still reachable: 81,051 bytes in 9 blocks ==70339== suppressed: 0 bytes in 0 blocks ==70339== Rerun with --leak-check=full to see details of leaked memory ==70339== ==70339== For counts of detected and suppressed errors, rerun with: -v ==70339== ERROR SUMMARY: 3 errors from 1 contexts (suppressed: 0 from 0) Killed sob., 8 gru 2018 o 17:18 Florian Weimer <fweimer@...hat.com> napisał(a): > * Rich Felker: > > > On Fri, Dec 07, 2018 at 09:06:18PM +0100, Florian Weimer wrote: > >> * Rich Felker: > >> > >> > I don't think so. I'm concerned that it's a stack overflow, and that > >> > somehow the kernel folks have managed to break the MINSIGSTKSZ ABI. > >> > >> Probably: > >> > >> <https://sourceware.org/bugzilla/show_bug.cgi?id=20305> > >> <https://sourceware.org/bugzilla/show_bug.cgi?id=22636> > >> > >> It's a nasty CPU backwards compatibility problem. Some of the > >> suggestions I made to work around this are simply wrong; don't take them > >> too seriously. > >> > >> Nowadays, the kernel has a way to disable the %zmm registers, but it > >> unfortunately does not reduce the save area size. > > > > How large is the saved context with the %zmm junk? I measured just > > ~768 bytes on normal x86_64 without it, and since 2048 is rounded up > > to a whole page (4096), overflow should not happen until the signal > > context is something like 3.5k (allowing ~512 bytes for TCB (~128) and > > 2 simple call frames). > > I wrote a test to do some measurements: > > <https://sourceware.org/ml/libc-alpha/2018-12/msg00271.html> > > The signal handler context is quite large on x86-64 with AVX-512F, > indeed around 3.5 KiB. It is even larger on ppc64 and ppc64el > (~4.5 KiB), which I find somewhat surprising. > > The cancellation test also includes stack usage from the libgcc > unwinder. Its stack usage likely differs between versions, so I should > have included that in the reported results. > > Thanks, > Florian > Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.