kernel-hardening - Re: [RFC PATCH] x86/entry/64: randomize kernel stack offset upon system call

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <EBF96DB2-B25D-437B-A067-F187E031BC3E@amacapital.net>
Date: Mon, 11 Feb 2019 07:54:59 -0800
From: Andy Lutomirski <luto@...capital.net>
To: "Reshetova, Elena" <elena.reshetova@...el.com>
Cc: Andy Lutomirski <luto@...nel.org>, Jann Horn <jannh@...gle.com>,
 "Perla, Enrico" <enrico.perla@...el.com>,
 Peter Zijlstra <peterz@...radead.org>,
 "kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>,
 "tglx@...utronix.de" <tglx@...utronix.de>,
 "mingo@...hat.com" <mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>,
 "keescook@...omium.org" <keescook@...omium.org>,
 "tytso@....edu" <tytso@....edu>
Subject: Re: [RFC PATCH] x86/entry/64: randomize kernel stack offset upon system call



On Feb 10, 2019, at 10:39 PM, Reshetova, Elena <elena.reshetova@...el.com> wrote:

>> On Sat, Feb 9, 2019 at 3:13 AM Reshetova, Elena
>> <elena.reshetova@...el.com> wrote:
>>> 
>>>> On Fri, Feb 08, 2019 at 01:20:09PM +0000, Reshetova, Elena wrote:
>>>>>> On Fri, Feb 08, 2019 at 02:15:49PM +0200, Elena Reshetova wrote:
>>>> 
>>>>>> 
>>>>>> Why can't we change the stack offset periodically from an interrupt or
>>>>>> so, and then have every later entry use that.
>>>>> 
>>>>> Hm... This sounds more complex conceptually - we cannot touch
>>>>> stack when it is in use, so we have to periodically probe for a
>>>>> good time (when process is in userspace I guess) to change it from an
>> interrupt?
>>>>> IMO trampoline stack provides such a good clean place for doing it and we
>>>>> have stackleak there doing stack cleanup, so would make sense to keep
>>>>> these features operating together.
>>>> 
>>>> The idea was to just change a per-cpu (possible per-task if you ctxsw
>>>> it) offset that is used on entry to offset the stack.
>>>> So only entries after the change will have the updated offset, any
>>>> in-progress syscalls will continue with their current offset and will be
>>>> unaffected.
>>> 
>>> Let me try to write this into simple steps to make sure I understand your
>>> approach:
>>> 
>>> - create a new per-stack value (and potentially its per-cpu "shadow") called
>> stack_offset = 0
>>> - periodically issue an interrupt, and inside it walk the process tree and
>>>  update stack_offset randomly for each process
>>> - when a process makes a new syscall, it subtracts stack_offset value from
>> top_of_stack()
>>> and that becomes its new  top_of_stack() for that system call.
>>> 
>>> Smth like this?
>> 
>> I'm proposing somthing that is conceptually different. 
> 
> OK, looks like I fully misunderstand what you meant indeed.
> The reason I didn’t reply to your earlier answer is that I started to look
> into unwinder code & logic to get at least a slight clue on how things
> can be done since I haven't looked in it almost at all before (I wasn't changing
> anything with regards to it, so I didn't have to). So, I meant to come back
> with a more rigid answer that just "let me study this first"...

Fair enough.

> 
> You are,
>> conceptually, changing the location of the stack.  I'm suggesting that
>> you leave the stack alone and, instead, randomize how you use the
>> stack. 
> 
> 
> So, yes, instead of having:
> 
> allocated_stack_top
> random_offset
> actual_stack_top
> pt_regs
> ...
> and so on
> 
> We will have smth like:
> 
> allocated_stack_top = actual_stack_top
> pt_regs
> random_offset
> ...
> 
> So, conceptually we have the same amount of randomization with 
> both approaches, but it is applied very differently. 

Exactly.

> 
> Security-wise I will have to think more if second approach has any negative
> consequences, in addition to positive ones. As a paranoid security person,
> you might want to merge both approaches and randomize both places (before and
> after pt_regs) with different offsets, but I guess this would be out of question, right? 

It’s not out of the question, but it needs some amount of cost vs benefit analysis.  The costs are complexity, speed, and a reduction in available randomness for any given amount of memory consumed.

> 
> I am not that experienced with exploits , but we have been
> talking now with Jann and Enrico on this, so I think it is the best they comment
> directly here. I am just wondering if having pt_regs in a fixed place can
> be an advantage for an attacker under any scenario... 

If an attacker has write-what-where (i.e. can write controlled values to controlled absolute virtual addresses), then I expect that pt_regs is a pretty low ranking target.  But it may be a fairly juicy target if you have a stack buffer overflow that lets an attacker write to a controlled *offset* from the stack. We used to keep thread_info at the bottom of the stack, and that was a great attack target.

But there’s an easier mitigation: just do regs->cs |= 3 or something like that in the exit code. Then any such attack can only corrupt *user* state.  The performance impact would be *very* low, since this could go in the asm path that’s only used for IRET to user mode.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.