kernel-hardening - Re: x86: PIE support and option to extend KASLR randomization

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJcbSZH6hwaWKrvUZR33ExYaZaWKMSv4tJJA3yZkniLvLbTFMw@mail.gmail.com>
Date: Tue, 29 Aug 2017 12:34:04 -0700
From: Thomas Garnier <thgarnie@...gle.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Herbert Xu <herbert@...dor.apana.org.au>, "David S . Miller" <davem@...emloft.net>, 
	Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, "H . Peter Anvin" <hpa@...or.com>, 
	Peter Zijlstra <peterz@...radead.org>, Josh Poimboeuf <jpoimboe@...hat.com>, 
	Arnd Bergmann <arnd@...db.de>, Matthias Kaehlcke <mka@...omium.org>, 
	Boris Ostrovsky <boris.ostrovsky@...cle.com>, Juergen Gross <jgross@...e.com>, 
	Paolo Bonzini <pbonzini@...hat.com>, Radim Krčmář <rkrcmar@...hat.com>, 
	Joerg Roedel <joro@...tes.org>, Tom Lendacky <thomas.lendacky@....com>, 
	Andy Lutomirski <luto@...nel.org>, Borislav Petkov <bp@...e.de>, Brian Gerst <brgerst@...il.com>, 
	"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>, "Rafael J . Wysocki" <rjw@...ysocki.net>, 
	Len Brown <len.brown@...el.com>, Pavel Machek <pavel@....cz>, Tejun Heo <tj@...nel.org>, 
	Christoph Lameter <cl@...ux.com>, Paul Gortmaker <paul.gortmaker@...driver.com>, 
	Chris Metcalf <cmetcalf@...lanox.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	"Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>, Nicolas Pitre <nicolas.pitre@...aro.org>, 
	Christopher Li <sparse@...isli.org>, "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>, 
	Lukas Wunner <lukas@...ner.de>, Mika Westerberg <mika.westerberg@...ux.intel.com>, 
	Dou Liyang <douly.fnst@...fujitsu.com>, Daniel Borkmann <daniel@...earbox.net>, 
	Alexei Starovoitov <ast@...nel.org>, Masahiro Yamada <yamada.masahiro@...ionext.com>, 
	Markus Trippelsdorf <markus@...ppelsdorf.de>, Steven Rostedt <rostedt@...dmis.org>, 
	Kees Cook <keescook@...omium.org>, Rik van Riel <riel@...hat.com>, 
	David Howells <dhowells@...hat.com>, Waiman Long <longman@...hat.com>, Kyle Huey <me@...ehuey.com>, 
	Peter Foley <pefoley2@...oley.com>, Tim Chen <tim.c.chen@...ux.intel.com>, 
	Catalin Marinas <catalin.marinas@....com>, Ard Biesheuvel <ard.biesheuvel@...aro.org>, 
	Michal Hocko <mhocko@...e.com>, Matthew Wilcox <mawilcox@...rosoft.com>, 
	"H . J . Lu" <hjl.tools@...il.com>, Paul Bolle <pebolle@...cali.nl>, Rob Landley <rob@...dley.net>, 
	Baoquan He <bhe@...hat.com>, Daniel Micay <danielmicay@...il.com>, 
	"the arch/x86 maintainers" <x86@...nel.org>, Linux Crypto Mailing List <linux-crypto@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, xen-devel <xen-devel@...ts.xenproject.org>, 
	kvm list <kvm@...r.kernel.org>, Linux PM list <linux-pm@...r.kernel.org>, 
	linux-arch <linux-arch@...r.kernel.org>, 
	Sparse Mailing-list <linux-sparse@...r.kernel.org>, 
	Kernel Hardening <kernel-hardening@...ts.openwall.com>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, Peter Zijlstra <a.p.zijlstra@...llo.nl>, 
	Borislav Petkov <bp@...en8.de>
Subject: Re: x86: PIE support and option to extend KASLR randomization

On Fri, Aug 25, 2017 at 8:05 AM, Thomas Garnier <thgarnie@...gle.com> wrote:
> On Fri, Aug 25, 2017 at 1:04 AM, Ingo Molnar <mingo@...nel.org> wrote:
>>
>> * Thomas Garnier <thgarnie@...gle.com> wrote:
>>
>>> With the fix for function tracing, the hackbench results have an
>>> average of +0.8 to +1.4% (from +8% to +10% before). With a default
>>> configuration, the numbers are closer to 0.8%.
>>>
>>> On the .text size, with gcc 4.9 I see +0.8% on default configuration
>>> and +1.180% on the ubuntu configuration.
>>
>> A 1% text size increase is still significant. Could you look at the disassembly,
>> where does the size increase come from?
>
> I will take a look, in this current iteration I added the .got and
> .got.plt so removing them will remove a big (even if they are small,
> we don't use them to increase perf).
>
> What do you think about the perf numbers in general so far?

I looked at the size increase. I could identify two common cases:

1) PIE sometime needs two instructions to represent a single
instruction on mcmodel=kernel.

For example, this instruction plays on the sign extension (mcmodel=kernel):

mov    r9,QWORD PTR [r11*8-0x7e3da060] (8 bytes)

The address 0xffffffff81c25fa0 can be represented as -0x7e3da060 using
a 32S relocation.

with PIE:

lea    rbx,[rip+<off>] (7 bytes)
mov    r9,QWORD PTR [rbx+r11*8] (6 bytes)

2) GCC does not optimize switches in PIE in order to reduce relocations:

For example the switch in phy_modes [1]:

static inline const char *phy_modes(phy_interface_t interface)
{
    switch (interface) {
    case PHY_INTERFACE_MODE_NA:
        return "";
    case PHY_INTERFACE_MODE_INTERNAL:
        return "internal";
    case PHY_INTERFACE_MODE_MII:
        return "mii";

Without PIE (gcc 7.2.0), the whole table is optimize to be one instruction:

   0x000000000040045b <+27>:    mov    rdi,QWORD PTR [rax*8+0x400660]

With PIE (gcc 7.2.0):

   0x0000000000000641 <+33>:    movsxd rax,DWORD PTR [rdx+rax*4]
   0x0000000000000645 <+37>:    add    rax,rdx
   0x0000000000000648 <+40>:    jmp    rax
....
   0x000000000000065d <+61>:    lea    rdi,[rip+0x264]        # 0x8c8
   0x0000000000000664 <+68>:    jmp    0x651 <main+49>
   0x0000000000000666 <+70>:    lea    rdi,[rip+0x2bc]        # 0x929
   0x000000000000066d <+77>:    jmp    0x651 <main+49>
   0x000000000000066f <+79>:    lea    rdi,[rip+0x2a8]        # 0x91e
   0x0000000000000676 <+86>:    jmp    0x651 <main+49>
   0x0000000000000678 <+88>:    lea    rdi,[rip+0x294]        # 0x913
   0x000000000000067f <+95>:    jmp    0x651 <main+49>

That's a deliberate choice, clang is able to optimize it (clang-3.8):

   0x0000000000000963 <+19>:    lea    rcx,[rip+0x200406]        # 0x200d70
   0x000000000000096a <+26>:    mov    rdi,QWORD PTR [rcx+rax*8]

I checked gcc and the code deciding to fold the switch basically do
not do it for pic to reduce relocations [2].

The switches are the biggest increase on small functions but I don't
think they represent a large portion of the difference (number 1 is).

A side note, while testing gcc 7.2.0 on hackbench I have seen the PIE
kernel being faster by 1% across multiple runs (comparing 50 runs done
across 5 reboots twice). I don't think PIE is faster than a
mcmodel=kernel but recent versions of gcc makes them fairly similar.

[1] http://elixir.free-electrons.com/linux/v4.13-rc7/source/include/linux/phy.h#L113
[2] https://github.com/gcc-mirror/gcc/blob/7977b0509f07e42fbe0f06efcdead2b7e4a5135f/gcc/tree-switch-conversion.c#L828

>
>>
>> Thanks,
>>
>>         Ingo
>
>
>
> --
> Thomas



-- 
Thomas
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.