Date: Fri, 20 Jan 2017 16:29:33 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Fix pthread_create on some devices failing to initialize guard area On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote: > > On 1/20/17 11:56 AM, Rich Felker wrote: > >On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote: > >>Hi All, > >> > >>While deploying test static executable across farm of different > >>embedded systems, found out that pthread_create() is failing > >>systematically on some (very few) arm-linux devices whenever non > >>null stack guard is enabled (that is, also when calling > >>pthread_create with default - i.e. null - attributes since default > >>is a one page of guard). One of those device is for example a > >>Marvell Armada 375 running Linux 3.10.39. Same test code, built with > >>alternative libc implementations (glibc, uClibc) works as expected > >>on those devices. > >> > >> > >>Issue > >> > >>This occurs because of call to mprotect() in pthread_create fails. > >>In current implementation, if guard size is non null, memory for > >>(guard + stack + ...) is first allocated (mmap'ed) with no > >>accessibility (PROT_NONE), then mprotect() is called to re-enable > >>read/write access to (memory + guardsize). Since call to mprotect() > >>systematically fails in this scenario (returning error code EINVAL), > >>it is impossible to create thread. > >Failure is ignored and the memory is assumed to be writable in this > >case, since EINVAL is assumed to imply no MMU. Is this assumption > >wrong in your case, and if so, can you explain why? > > In my case, devices exhibiting issue are not MMU-less, they are > Cortex-A9 devices with valid mmu / page protection working as > expected otherwise. Note that current Musl code assumes ENOSYS means > no MMU and handles it by assuming the system has no page protection > at all. For the case I observe, it is EINVAL which is returned, this > is not ignored, so memory is unmap'ed and pthread_create() fails. In that case I think this is a kernel bug. Do you know why EINVAL is happening? If there's an MMU, Linux should be able to replace the anon PROT_NONE pages with anon RW pages. > >>In proposed patch (attached below), memory for (guard + stack + ...) > >>is first mmap'ed with read/write accessibility, then guard area is > >>protected by calling mprotect() with PROT_NONE on guardsize first > >>bytes of returned memory. This call to mprotect() to remove all > >>accessibility on guard area, with guard area being at beginning of > >>previously mmap'ed memory, works correctly on those platforms having > >>issue with current implementation. Incidentally, this makes the > >>logic more concise to handle both cases (with or without guard) is a > >>more consistent way, and handle systems with partial/invalid page > >>protection implementation (e.g. mprotect() returning ENOSYS) more > >>gracefully since the stack is explicitly created with read/write > >>access. > >This doesn't work correctly on normal systems with mmu, because the > >size of the guard pages is accounted against commit charge. Linux > >should, but AFAIK doesn't, subtract it from commit charge once it's > >changed to PROT_NONE without having been dirtied, but even if this bug > >is fixed on the kernel side, there would still be a moment where > >excess commit charge is consumed and thus where pthread_create might > >spuriously fail or cause allocations in other processes/threads to > >fail. > > > >If the kernel is not allocating actually-usable address ranges for > >PROT_NONE on all nommu systems, I think the only solution is to handle > >EINVAL from mprotect by going back and re-doing the mmap with > >PROT_READ|PROT_WRITE. Do you have any better ideas? > > > >Rich > Had this "deja vu" feeling... reminds me conversation you had in > this thread some time ago elsewhere... > https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html > > Your proposition seems reasonable on nommu system, but again, the > issue observed here is on legit systems with mmu, with mprotect > failing with EINVAL (and not ENOSYS), for some other reason than > system not supporting page protection. Catching EINVAL error > returned by mprotect and falling back to re-doing the mmap would > mean actually silently running without stack guard on system > supporting it, so I believe it is actually legitimate to fail and > return error in that case. But that's difference use case than the > issue I'm observing. > > I took note of Balazs's suggestion (in the thread referenced above) > to switch to a pattern similar to Musl's current one > (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid > those guard pages to actually occupy resources. But I can indeed > observe that this approach fails on some devices (which have valid > mmu), while I'm not sure I'm seeing the issue with first mapping > PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter > approach (as implemented by patch) is, at least, consistent with all > the other implementations out there (I checked glibc's > allocatestack.c, but also e.g. bionic), and couldn't find report of > those failures you are envisaging. Consider the case of guard_size=128M stack_size=128k with Commit_Limit=128M. This will fail with your approach but works perfectly well now. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.