Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 1 Mar 2017 16:08:42 -0800
From: Kees Cook <keescook@...omium.org>
To: Russell King - ARM Linux <linux@...linux.org.uk>
Cc: "kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>, 
	Mark Rutland <mark.rutland@....com>, Andy Lutomirski <luto@...nel.org>, Hoeun Ryu <hoeun.ryu@...il.com>, 
	PaX Team <pageexec@...email.hu>, Emese Revfy <re.emese@...il.com>, 
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC][PATCH 5/8] ARM: Implement __arch_rare_write_map/unmap()

On Wed, Mar 1, 2017 at 3:30 AM, Russell King - ARM Linux
<linux@...linux.org.uk> wrote:
> On Tue, Feb 28, 2017 at 09:41:07PM -0800, Kees Cook wrote:
>> On Tue, Feb 28, 2017 at 5:04 PM, Russell King - ARM Linux
>> <linux@...linux.org.uk> wrote:
>> > On Mon, Feb 27, 2017 at 12:43:03PM -0800, Kees Cook wrote:
>> >> Based on grsecurity's ARM pax_{open,close}_kernel() implementation, this
>> >> allows HAVE_ARCH_RARE_WRITE to work on ARM.
>> >
>> > This has the effect that any memory mapped with DOMAIN_KERNEL will
>> > loose it's NX status, and may end up being read into the I-cache.
>>
>> Arbitrarily so, or only memory accessed/pre-fetched by the CPU when in
>> this state? i.e. since this is non-preempt, only touches the needed
>> memory, and has the original domain state restored within a few
>> instructions, does this avoid the problem? It seems like the chance
>> for a speculative prefetch from device memory under these conditions
>> should be approaching zero.
>
> "The software that defines a translation table must mark any region of
>  memory that is read-sensitive as execute-never, to avoid the possibility
>  of a speculative fetch accessing the memory region. For example, it must
>  mark any memory region that corresponds to a read-sensitive peripheral
>  as Execute-never."
>
> Also see:
>
> commit 247055aa21ffef1c49dd64710d5e94c2aee19b58
> Author: Catalin Marinas <catalin.marinas@....com>
> Date:   Mon Sep 13 16:03:21 2010 +0100
>
>     ARM: 6384/1: Remove the domain switching on ARMv6k/v7 CPUs
>
> which removed the domain switching I referred to previously.
>
> The way the ARM ARM looks at instruction speculative prefetch is that it
> can happen to any location that is not explicitly marked as Execute-never.
> (This is because the ARM ARM doesn't define an implementation.)  So, we
> have to assume that any location that is not marked XN may be speculatively
> prefetched by the processor.
>
> Device memory can be read-sensitive - eg, reading an interrupt status
> register can clear the ending interrupt bits.

OOoh, yes, this is the part that wasn't getting in my head. I was
stuck thinking it was an XN bypass (due to the icache mention). Got
it.

> A speculative prefetch is a read as far as a device is concerned, so
> bypassing the XN permission by switching the domain to manager mode has
> the effect that the processor can then _legally_ speculatively prefetch
> from a device, and if it happens to hit a device that contains a read
> sensitive location, the side effects of reading that location will
> happen, even though the program did not perform an explicit read.
>
>> Just to make sure I understand: it was only speculative prefetch vs
>> icache, right? Would an icache flush restore the correct permissions?
>
> It's not about permissions, it's about the side effects at the device
> of a read created by the speculative prefetch.
>
>> I'm just pondering alternatives. Also, is there a maximum distance the
>> prefetching spans? i.e. could device memory be guaranteed to be
>> vmapped far enough away from kernel memory to avoid prefetches?
>
> The root cause of this problem is the way we lump both vmalloc() and
> ioremap() mappings into the same memory space (vmalloc region) without
> caring about the domain.

Okay, I patched ptdump (badly) to answer some questions I had on this,
but it looks like everything I was curious about is DOMAIN_KERNEL.

The memory area that write-rarely would want to touch would only be
the .rodata segment, which we could put under a different domain that
looked otherwise identical to DOMAIN_KERNEL. This would apply to
modules too, since those appear to be below the kernel, and not part
of the vmalloc area. Instead of dealing with vmalloc vs ioremap, don't
we just need to adjust the domain of the kernel (or just rodata) and
modules mappings then? We don't need to touch vmalloc at all.

Maybe I'm (still) missing something...

-Kees

> If all device memory was guaranteed to be placed under a different
> domain, then this problem would not exist.  In order to achieve that,
> there's several ways I can think of doing it:
>
> 1) Have separate virtual memory regions for ioremap() and vmalloc()
>    We would need to choose an arbitary limit on the size of these
>    memory pools, which may not suit everyone.
>
> 2) Have vmalloc() grow up as a heap, ioremap() grow down as a stack
>    and a dynamic boundary (aligned to 1 or 2MB) between the two, no
>    mixing allowed.  This avoids the problem with (1) but still results
>    in the required separation.
>
> 3) Align vmalloc region allocations to 2MB, but this would be very
>    wasteful.
>
> 4) Only permit same type (ioremap/vmalloc) of mapping within a 2MB block
>    of vmalloc space.  In other words, a primary allocator of 2MB blocks
>    and a sub-allocator of page-sized blocks (think of the way our
>    page allocator vs slab works.)  Probably going to be subject to
>    fragmentation problems.
>
> 5) Place all vmalloc() and ioremap() mappings under a separate domain,
>    so that all these mappings would be unaffected by the change of
>    domain settings (the resulting permissions would never change.)
>    In other words, DOMAIN_IO becomes DOMAIN_VMALLOC and is used for all
>    mappings in vmalloc space.
>
> The problem with (2) and (5) is teaching pte_alloc_kernel() down to
> pmd_populate_kernel() about the differences - currently, this only ever
> sets up DOMAIN_KERNEL mappings because there's no way for it to know
> what kind of mapping is required.
>
> --
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.



-- 
Kees Cook
Pixel Security

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.