Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 12 Aug 2020 10:18:17 -0700
From: Kristen Carlson Accardi <kristen@...ux.intel.com>
To: Joe Lawrence <joe.lawrence@...hat.com>
Cc: keescook@...omium.org, tglx@...utronix.de, mingo@...hat.com,
 bp@...en8.de,  arjan@...ux.intel.com, x86@...nel.org,
 linux-kernel@...r.kernel.org,  kernel-hardening@...ts.openwall.com,
 rick.p.edgecombe@...el.com,  live-patching@...r.kernel.org
Subject: Re: [PATCH v4 00/10] Function Granular KASLR

On Tue, 2020-08-04 at 14:23 -0400, Joe Lawrence wrote:
> On Fri, Jul 17, 2020 at 09:59:57AM -0700, Kristen Carlson Accardi
> wrote:
> > Function Granular Kernel Address Space Layout Randomization
> > (fgkaslr)
> > -----------------------------------------------------------------
> > ----
> > 
> > This patch set is an implementation of finer grained kernel address
> > space
> > randomization. It rearranges your kernel code at load time 
> > on a per-function level granularity, with only around a second
> > added to
> > boot time.
> > 
> > Changes in v4:
> > -------------
> > * dropped the patch to split out change to STATIC definition in
> >   x86/boot/compressed/misc.c and replaced with a patch authored
> >   by Kees Cook to avoid the duplicate malloc definitions
> > * Added a section to Documentation/admin-guide/kernel-
> > parameters.txt
> >   to document the fgkaslr boot option.
> > * redesigned the patch to hide the new layout when reading
> >   /proc/kallsyms. The previous implementation utilized a
> > dynamically
> >   allocated linked list to display the kernel and module symbols
> >   in alphabetical order. The new implementation uses a randomly
> >   shuffled index array to display the kernel and module symbols
> >   in a random order.
> > 
> > Changes in v3:
> > -------------
> > * Makefile changes to accommodate
> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> > * removal of extraneous ALIGN_PAGE from _etext changes
> > * changed variable names in x86/tools/relocs to be less confusing
> > * split out change to STATIC definition in
> > x86/boot/compressed/misc.c
> > * Updates to Documentation to make it more clear what is preserved
> > in .text
> > * much more detailed commit message for function granular KASLR
> > patch
> > * minor tweaks and changes that make for more readable code
> > * this cover letter updated slightly to add additional details
> > 
> > Changes in v2:
> > --------------
> > * Fix to address i386 build failure
> > * Allow module reordering patch to be configured separately so that
> >   arm (or other non-x86_64 arches) can take advantage of module
> > function
> >   reordering. This support has not be tested by me, but smoke
> > tested by
> >   Ard Biesheuvel <ardb@...nel.org> on arm.
> > * Fix build issue when building on arm as reported by
> >   Ard Biesheuvel <ardb@...nel.org> 
> > 
> > Patches to objtool are included because they are dependencies for
> > this
> > patchset, however they have been submitted by their maintainer
> > separately.
> > 
> > Background
> > ----------
> > KASLR was merged into the kernel with the objective of increasing
> > the
> > difficulty of code reuse attacks. Code reuse attacks reused
> > existing code
> > snippets to get around existing memory protections. They exploit
> > software bugs
> > which expose addresses of useful code snippets to control the flow
> > of
> > execution for their own nefarious purposes. KASLR moves the entire
> > kernel
> > code text as a unit at boot time in order to make addresses less
> > predictable.
> > The order of the code within the segment is unchanged - only the
> > base address
> > is shifted. There are a few shortcomings to this algorithm.
> > 
> > 1. Low Entropy - there are only so many locations the kernel can
> > fit in. This
> >    means an attacker could guess without too much trouble.
> > 2. Knowledge of a single address can reveal the offset of the base
> > address,
> >    exposing all other locations for a published/known kernel image.
> > 3. Info leaks abound.
> > 
> > Finer grained ASLR has been proposed as a way to make ASLR more
> > resistant
> > to info leaks. It is not a new concept at all, and there are many
> > variations
> > possible. Function reordering is an implementation of finer grained
> > ASLR
> > which randomizes the layout of an address space on a function level
> > granularity. We use the term "fgkaslr" in this document to refer to
> > the
> > technique of function reordering when used with KASLR, as well as
> > finer grained
> > KASLR in general.
> > 
> > Proposed Improvement
> > --------------------
> > This patch set proposes adding function reordering on top of the
> > existing
> > KASLR base address randomization. The over-arching objective is
> > incremental
> > improvement over what we already have. It is designed to work in
> > combination
> > with the existing solution. The implementation is really pretty
> > simple, and
> > there are 2 main area where changes occur:
> > 
> > * Build time
> > 
> > GCC has had an option to place functions into individual .text
> > sections for
> > many years now. This option can be used to implement function
> > reordering at
> > load time. The final compiled vmlinux retains all the section
> > headers, which
> > can be used to help find the address ranges of each function. Using
> > this
> > information and an expanded table of relocation addresses,
> > individual text
> > sections can be suffled immediately after decompression. Some data
> > tables
> > inside the kernel that have assumptions about order require re-
> > sorting
> > after being updated when applying relocations. In order to modify
> > these tables,
> > a few key symbols are excluded from the objcopy symbol stripping
> > process for
> > use after shuffling the text segments.
> > 
> > Some highlights from the build time changes to look for:
> > 
> > The top level kernel Makefile was modified to add the gcc flag if
> > it
> > is supported. Currently, I am applying this flag to everything it
> > is
> > possible to randomize. Anything that is written in C and not
> > present in a
> > special input section is randomized. The final binary segment 0
> > retains a
> > consolidated .text section, as well as all the individual .text.*
> > sections.
> > Future work could turn off this flags for selected files or even
> > entire
> > subsystems, although obviously at the cost of security.
> > 
> > The relocs tool is updated to add relative relocations. This
> > information
> > previously wasn't included because it wasn't necessary when moving
> > the
> > entire .text segment as a unit. 
> > 
> > A new file was created to contain a list of symbols that objcopy
> > should
> > keep. We use those symbols at load time as described below.
> > 
> > * Load time
> > 
> > The boot kernel was modified to parse the vmlinux elf file after
> > decompression to check for our interesting symbols that we kept,
> > and to
> > look for any .text.* sections to randomize. The consolidated .text
> > section
> > is skipped and not moved. The sections are shuffled randomly, and
> > copied
> > into memory following the .text section in a new random order. The
> > existing
> > code which updated relocation addresses was modified to account for
> > not just a fixed delta from the load address, but the offset that
> > the function
> > section was moved to. This requires inspection of each address to
> > see if
> > it was impacted by a randomization. We use a bsearch to make this
> > less
> > horrible on performance. Any tables that need to be modified with
> > new
> > addresses or resorted are updated using the symbol addresses parsed
> > from the
> > elf symbol table.
> > 
> > In order to hide our new layout, symbols reported through
> > /proc/kallsyms
> > will be displayed in a random order.
> > 
> > Security Considerations
> > -----------------------
> > The objective of this patch set is to improve a technology that is
> > already
> > merged into the kernel (KASLR). This code will not prevent all
> > attacks,
> > but should instead be considered as one of several tools that can
> > be used.
> > In particular, this code is meant to make KASLR more effective in
> > the presence
> > of info leaks.
> > 
> > How much entropy we are adding to the existing entropy of standard
> > KASLR will
> > depend on a few variables. Firstly and most obviously, the number
> > of functions
> > that are randomized matters. This implementation keeps the existing
> > .text
> > section for code that cannot be randomized - for example, because
> > it was
> > assembly code. The less sections to randomize, the less entropy. In
> > addition,
> > due to alignment (16 bytes for x86_64), the number of bits in a
> > address that
> > the attacker needs to guess is reduced, as the lower bits are
> > identical.
> > 
> > Performance Impact
> > ------------------
> > There are two areas where function reordering can impact
> > performance: boot
> > time latency, and run time performance.
> > 
> > * Boot time latency
> > This implementation of finer grained KASLR impacts the boot time of
> > the kernel
> > in several places. It requires additional parsing of the kernel ELF
> > file to
> > obtain the section headers of the sections to be randomized. It
> > calls the
> > random number generator for each section to be randomized to
> > determine that
> > section's new memory location. It copies the decompressed kernel
> > into a new
> > area of memory to avoid corruption when laying out the newly
> > randomized
> > sections. It increases the number of relocations the kernel has to
> > perform at
> > boot time vs. standard KASLR, and it also requires a lookup on each
> > address
> > that needs to be relocated to see if it was in a randomized section
> > and needs
> > to be adjusted by a new offset. Finally, it re-sorts a few data
> > tables that
> > are required to be sorted by address.
> > 
> > Booting a test VM on a modern, well appointed system showed an
> > increase in
> > latency of approximately 1 second.
> > 
> > * Run time
> > The performance impact at run-time of function reordering varies by
> > workload.
> > Using kcbench, a kernel compilation benchmark, the performance of a
> > kernel
> > build with finer grained KASLR was about 1% slower than a kernel
> > with standard
> > KASLR. Analysis with perf showed a slightly higher percentage of 
> > L1-icache-load-misses. Other workloads were examined as well, with
> > varied
> > results. Some workloads performed significantly worse under
> > FGKASLR, while
> > others stayed the same or were mysteriously better. In general, it
> > will
> > depend on the code flow whether or not finer grained KASLR will
> > impact
> > your workload, and how the underlying code was designed. Because
> > the layout
> > changes per boot, each time a system is rebooted the performance of
> > a workload
> > may change.
> > 
> > Future work could identify hot areas that may not be randomized and
> > either
> > leave them in the .text section or group them together into a
> > single section
> > that may be randomized. If grouping things together helps, one
> > other thing to
> > consider is that if we could identify text blobs that should be
> > grouped together
> > to benefit a particular code flow, it could be interesting to
> > explore
> > whether this security feature could be also be used as a
> > performance
> > feature if you are interested in optimizing your kernel layout for
> > a
> > particular workload at boot time. Optimizing function layout for a
> > particular
> > workload has been researched and proven effective - for more
> > information
> > read the Facebook paper "Optimizing Function Placement for Large-
> > Scale
> > Data-Center Applications" (see references section below).
> > 
> > Image Size
> > ----------
> > Adding additional section headers as a result of compiling with
> > -ffunction-sections will increase the size of the vmlinux ELF file.
> > With a standard distro config, the resulting vmlinux was increased
> > by
> > about 3%. The compressed image is also increased due to the header
> > files,
> > as well as the extra relocations that must be added. You can expect
> > fgkaslr
> > to increase the size of the compressed image by about 15%.
> > 
> > Memory Usage
> > ------------
> > fgkaslr increases the amount of heap that is required at boot time,
> > although this extra memory is released when the kernel has finished
> > decompression. As a result, it may not be appropriate to use this
> > feature on
> > systems without much memory.
> > 
> > Building
> > --------
> > To enable fine grained KASLR, you need to have the following config
> > options
> > set (including all the ones you would use to build normal KASLR)
> > 
> > CONFIG_FG_KASLR=y
> > 
> > In addition, fgkaslr is only supported for the X86_64 architecture.
> > 
> > Modules
> > -------
> > Modules are randomized similarly to the rest of the kernel by
> > shuffling
> > the sections at load time prior to moving them into memory. The
> > module must
> > also have been build with the -ffunction-sections compiler option.
> > 
> > Although fgkaslr for the kernel is only supported for the X86_64
> > architecture,
> > it is possible to use fgkaslr with modules on other architectures.
> > To enable
> > this feature, select
> > 
> > CONFIG_MODULE_FG_KASLR=y
> > 
> > This option is selected automatically for X86_64 when
> > CONFIG_FG_KASLR is set.
> > 
> > Disabling
> > ---------
> > Disabling normal KASLR using the nokaslr command line option also
> > disables
> > fgkaslr. It is also possible to disable fgkaslr separately by
> > booting with
> > fgkaslr=off on the commandline.
> > 
> > References
> > ----------
> > There are a lot of academic papers which explore finer grained
> > ASLR.
> > This paper in particular contributed the most to my implementation
> > design
> > as well as my overall understanding of the problem space:
> > 
> > Selfrando: Securing the Tor Browser against De-anonymization
> > Exploits,
> > M. Conti, S. Crane, T. Frassetto, et al.
> > 
> > For more information on how function layout impacts performance,
> > see:
> > 
> > Optimizing Function Placement for Large-Scale Data-Center
> > Applications,
> > G. Ottoni, B. Maher
> > 
> > Kees Cook (2):
> >   x86/boot: Allow a "silent" kaslr random byte fetch
> >   x86/boot/compressed: Avoid duplicate malloc() implementations
> > 
> > Kristen Carlson Accardi (8):
> >   objtool: Do not assume order of parent/child functions
> >   x86: tools/relocs: Support >64K section headers
> >   x86: Makefile: Add build and config option for CONFIG_FG_KASLR
> >   x86: Make sure _etext includes function sections
> >   x86/tools: Add relative relocs for randomized functions
> >   x86: Add support for function granular KASLR
> >   kallsyms: Hide layout
> >   module: Reorder functions
> > 
> >  .../admin-guide/kernel-parameters.txt         |   7 +
> >  Documentation/security/fgkaslr.rst            | 172 ++++
> >  Documentation/security/index.rst              |   1 +
> >  Makefile                                      |   6 +-
> >  arch/x86/Kconfig                              |   4 +
> >  arch/x86/Makefile                             |   5 +
> >  arch/x86/boot/compressed/Makefile             |   9 +-
> >  arch/x86/boot/compressed/fgkaslr.c            | 811
> > ++++++++++++++++++
> >  arch/x86/boot/compressed/kaslr.c              |   4 -
> >  arch/x86/boot/compressed/misc.c               | 157 +++-
> >  arch/x86/boot/compressed/misc.h               |  30 +
> >  arch/x86/boot/compressed/utils.c              |  11 +
> >  arch/x86/boot/compressed/vmlinux.symbols      |  17 +
> >  arch/x86/include/asm/boot.h                   |  15 +-
> >  arch/x86/kernel/vmlinux.lds.S                 |  17 +-
> >  arch/x86/lib/kaslr.c                          |  18 +-
> >  arch/x86/tools/relocs.c                       | 143 ++-
> >  arch/x86/tools/relocs.h                       |   4 +-
> >  arch/x86/tools/relocs_common.c                |  15 +-
> >  include/asm-generic/vmlinux.lds.h             |  18 +-
> >  include/linux/decompress/mm.h                 |  12 +-
> >  include/uapi/linux/elf.h                      |   1 +
> >  init/Kconfig                                  |  26 +
> >  kernel/kallsyms.c                             | 163 +++-
> >  kernel/module.c                               |  81 ++
> >  tools/objtool/elf.c                           |   8 +-
> >  26 files changed, 1670 insertions(+), 85 deletions(-)
> >  create mode 100644 Documentation/security/fgkaslr.rst
> >  create mode 100644 arch/x86/boot/compressed/fgkaslr.c
> >  create mode 100644 arch/x86/boot/compressed/utils.c
> >  create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
> > 
> > 
> > base-commit: 11ba468877bb23f28956a35e896356252d63c983
> > -- 
> > 2.20.1
> > 
> 
> Apologies in advance if this has already been discussed elsewhere,
> but I
> did finally get around to testing the patchset against the
> livepatching
> kselftests.
> 
> The livepatching kselftests fail as all livepatches stall their
> transitions.  It appears that reliable (ORC) stack unwinding is
> broken
> when fgkaslr is enabled.
> 
> Relevant config options:
> 
>   CONFIG_ARCH_HAS_FG_KASLR=y
>   CONFIG_ARCH_STACKWALK=y
>   CONFIG_FG_KASLR=y
>   CONFIG_HAVE_LIVEPATCH=y
>   CONFIG_HAVE_RELIABLE_STACKTRACE=y
>   CONFIG_LIVEPATCH=y
>   CONFIG_MODULE_FG_KASLR=y
>   CONFIG_TEST_LIVEPATCH=m
>   CONFIG_UNWINDER_ORC=y
> 
> The livepatch transitions are stuck along this call path:
> 
>   klp_check_stack
>     stack_trace_save_tsk_reliable
>       arch_stack_walk_reliable
>   
>           /* Check for stack corruption */
>           if (unwind_error(&state))
>                   return -EINVAL;
> 
> where the unwinder error is set by unwind_next_frame():
> 
>   arch/x86/kernel/unwind_orc.c
>   bool unwind_next_frame(struct unwind_state *state)
>  
> sometimes here:
>  
>   	/* End-of-stack check for kernel threads: */
>   	if (orc->sp_reg == ORC_REG_UNDEFINED) {
>   		if (!orc->end)
>   			goto err;
>   
>   		goto the_end;
>   	}
> 
> or here:
> 
>   	/* Prevent a recursive loop due to bad ORC data:
> */                                                                   
>              
>   	if (state->stack_info.type == prev_type
> &&                                                                   
>                       
>   	    on_stack(&state->stack_info, (void *)state->sp,
> sizeof(long))
> &&                                                               
>   	    state->sp <= prev_sp)
> {                                                                    
>                                     
>   		orc_warn_current("stack going in the wrong direction?
> at %pB\n",                                                           
>   				 (void
> *)orig_ip);                                                          
>                                
>   		goto
> err;                                                                 
>                                                  
>   	}
> 
> (and probably other places the ORC unwinder gets confused.)
> 
> 
> It also manifests itself in other, more visible ways.  For example, a
> kernel module that calls dump_stack() in its init function or even
> /proc/<pid>/stack:
> 
> (fgkaslr on)
> ------------
> 
> Call Trace:
>  ? dump_stack+0x57/0x73
>  ? 0xffffffffc0850000
>  ? mymodule_init+0xa/0x1000 [dumpstack]
>  ? do_one_initcall+0x46/0x1f0
>  ? free_unref_page_commit+0x91/0x100
>  ? _cond_resched+0x15/0x30
>  ? kmem_cache_alloc_trace+0x14b/0x210
>  ? do_init_module+0x5a/0x220
>  ? load_module+0x1912/0x1b20
>  ? __do_sys_finit_module+0xa8/0x110
>  ? __do_sys_finit_module+0xa8/0x110
>  ? do_syscall_64+0x47/0x80
>  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> % sudo cat /proc/$$/stack
> [<0>] do_wait+0x1c3/0x230
> [<0>] kernel_wait4+0xa6/0x140
> 
> 
> fgkaslr=off
> -----------
> 
> Call Trace:
>  dump_stack+0x57/0x73
>  ? 0xffffffffc04f2000
>  mymodule_init+0xa/0x1000 [readonly]
>  do_one_initcall+0x46/0x1f0
>  ? free_unref_page_commit+0x91/0x100
>  ? _cond_resched+0x15/0x30
>  ? kmem_cache_alloc_trace+0x14b/0x210
>  do_init_module+0x5a/0x220
>  load_module+0x1912/0x1b20
>  ? __do_sys_finit_module+0xa8/0x110
>  __do_sys_finit_module+0xa8/0x110
>  do_syscall_64+0x47/0x80
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> % sudo cat /proc/$$/stack
> [<0>] do_wait+0x1c3/0x230
> [<0>] kernel_wait4+0xa6/0x140
> [<0>] __do_sys_wait4+0x83/0x90
> [<0>] do_syscall_64+0x47/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> 
> I would think fixing and verifying these latter cases would be easier
> than
> chasing livepatch transitions (but would still probably fix klp case,
> too).
> Perhaps Josh or someone has other ORC unwinder tests that could be
> used?
> 
> -- Joe
> 

OK, I have root caused these failures to the fact that the relocs for
the orc_unwind_ip table that I use to adjust the values of the
orc_unwind_ip table after randomization were incorrect because the
table is sorted at build time now, thus making the relocs invalid. I
can fix this either by turning off BUILDTIME_TABLE_SORT (and now the
relocs are fine), or by ignoring any relocs in the orc_unwind_ip table
and adjusting without relocs. I think it makes sense to just unset
BUILDTIME_TABLE_SORT if CONFIG_FG_KASLR and continue to rely on relocs
to work since it is a useless step anyway.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.