Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 20 Jan 2017 16:29:33 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Fix pthread_create on some devices failing to initialize
 guard area

On Fri, Jan 20, 2017 at 01:04:28PM -0800, Eric Hassold wrote:
> 
> On 1/20/17 11:56 AM, Rich Felker wrote:
> >On Fri, Jan 20, 2017 at 11:45:09AM -0800, Eric Hassold wrote:
> >>Hi All,
> >>
> >>While deploying test static executable across farm of different
> >>embedded systems, found out that pthread_create() is failing
> >>systematically on some (very few) arm-linux devices whenever non
> >>null stack guard is enabled (that is, also when calling
> >>pthread_create with default - i.e. null - attributes since default
> >>is a one page of guard). One of those device is for example a
> >>Marvell Armada 375 running Linux 3.10.39. Same test code, built with
> >>alternative libc implementations (glibc, uClibc) works as expected
> >>on those devices.
> >>
> >>
> >>Issue
> >>
> >>This occurs because of call to mprotect() in pthread_create fails.
> >>In current implementation, if guard size is non null, memory for
> >>(guard + stack + ...) is first allocated (mmap'ed) with no
> >>accessibility (PROT_NONE), then mprotect() is called to re-enable
> >>read/write access to (memory + guardsize). Since call to mprotect()
> >>systematically fails in this scenario (returning error code EINVAL),
> >>it is impossible to create thread.
> >Failure is ignored and the memory is assumed to be writable in this
> >case, since EINVAL is assumed to imply no MMU. Is this assumption
> >wrong in your case, and if so, can you explain why?
> 
> In my case, devices exhibiting issue are not MMU-less, they are
> Cortex-A9 devices with valid mmu / page protection working as
> expected otherwise. Note that current Musl code assumes ENOSYS means
> no MMU and handles it by assuming the system has no page protection
> at all. For the case I observe, it is EINVAL which is returned, this
> is not ignored, so memory is unmap'ed and pthread_create() fails.

In that case I think this is a kernel bug. Do you know why EINVAL is
happening? If there's an MMU, Linux should be able to replace the
anon PROT_NONE pages with anon RW pages.

> >>In proposed patch (attached below), memory for (guard + stack + ...)
> >>is first mmap'ed with read/write accessibility, then guard area is
> >>protected by calling mprotect() with PROT_NONE on guardsize first
> >>bytes of returned memory. This call to mprotect() to remove all
> >>accessibility on guard area, with guard area being at beginning of
> >>previously mmap'ed memory, works correctly on those platforms having
> >>issue with current implementation. Incidentally, this makes the
> >>logic more concise to handle both cases (with or without guard) is a
> >>more consistent way, and handle systems with partial/invalid page
> >>protection implementation (e.g. mprotect() returning ENOSYS) more
> >>gracefully since the stack is explicitly created with read/write
> >>access.
> >This doesn't work correctly on normal systems with mmu, because the
> >size of the guard pages is accounted against commit charge. Linux
> >should, but AFAIK doesn't, subtract it from commit charge once it's
> >changed to PROT_NONE without having been dirtied, but even if this bug
> >is fixed on the kernel side, there would still be a moment where
> >excess commit charge is consumed and thus where pthread_create might
> >spuriously fail or cause allocations in other processes/threads to
> >fail.
> >
> >If the kernel is not allocating actually-usable address ranges for
> >PROT_NONE on all nommu systems, I think the only solution is to handle
> >EINVAL from mprotect by going back and re-doing the mmap with
> >PROT_READ|PROT_WRITE. Do you have any better ideas?
> >
> >Rich
> Had this "deja vu" feeling... reminds me conversation you had in
> this thread some time ago elsewhere...
> https://sourceware.org/ml/libc-alpha/2015-09/msg00447.html
> 
> Your proposition seems reasonable on nommu system, but again, the
> issue observed here is on legit systems with mmu, with mprotect
> failing with EINVAL (and not ENOSYS), for some other reason than
> system not supporting page protection. Catching EINVAL error
> returned by mprotect and falling back to re-doing the mmap would
> mean actually silently running without stack guard on system
> supporting it, so I believe it is actually legitimate to fail and
> return error in that case. But that's difference use case than the
> issue I'm observing.
> 
> I took note of Balazs's suggestion (in the thread referenced above)
> to switch to a pattern similar to Musl's current one
> (mmap(PROT_NONE) + mprotect(stack, READ|WRITE)) in order to avoid
> those guard pages to actually occupy resources. But I can indeed
> observe that this approach fails on some devices (which have valid
> mmu), while I'm not sure I'm seeing the issue with first mapping
> PROT_READ|PROT_WRITE then mprotect(PROT_NONE) guard area. Latter
> approach (as implemented by patch) is, at least, consistent with all
> the other implementations out there (I checked glibc's
> allocatestack.c, but also e.g. bionic), and couldn't find report of
> those failures you are envisaging.

Consider the case of guard_size=128M stack_size=128k with
Commit_Limit=128M. This will fail with your approach but works
perfectly well now.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.