kernel-hardening - RE: [RFC PATCH] x86/entry/64: randomize kernel stack offset upon system call

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2236FBA76BA1254E88B949DDB74E612BA4BBA73C@IRSMSX102.ger.corp.intel.com>
Date: Mon, 11 Feb 2019 06:39:04 +0000
From: "Reshetova, Elena" <elena.reshetova@...el.com>
To: Andy Lutomirski <luto@...nel.org>, Jann Horn <jannh@...gle.com>, "Perla,
 Enrico" <enrico.perla@...el.com>
CC: Peter Zijlstra <peterz@...radead.org>,
	"kernel-hardening@...ts.openwall.com" <kernel-hardening@...ts.openwall.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>, "mingo@...hat.com"
	<mingo@...hat.com>, "bp@...en8.de" <bp@...en8.de>, "keescook@...omium.org"
	<keescook@...omium.org>, "tytso@....edu" <tytso@....edu>
Subject: RE: [RFC PATCH] x86/entry/64: randomize kernel stack offset upon
 system call

> On Sat, Feb 9, 2019 at 3:13 AM Reshetova, Elena
> <elena.reshetova@...el.com> wrote:
> >
> > > On Fri, Feb 08, 2019 at 01:20:09PM +0000, Reshetova, Elena wrote:
> > > > > On Fri, Feb 08, 2019 at 02:15:49PM +0200, Elena Reshetova wrote:
> > >
> > > > >
> > > > > Why can't we change the stack offset periodically from an interrupt or
> > > > > so, and then have every later entry use that.
> > > >
> > > > Hm... This sounds more complex conceptually - we cannot touch
> > > > stack when it is in use, so we have to periodically probe for a
> > > > good time (when process is in userspace I guess) to change it from an
> interrupt?
> > > > IMO trampoline stack provides such a good clean place for doing it and we
> > > > have stackleak there doing stack cleanup, so would make sense to keep
> > > > these features operating together.
> > >
> > > The idea was to just change a per-cpu (possible per-task if you ctxsw
> > > it) offset that is used on entry to offset the stack.
> > > So only entries after the change will have the updated offset, any
> > > in-progress syscalls will continue with their current offset and will be
> > > unaffected.
> >
> > Let me try to write this into simple steps to make sure I understand your
> > approach:
> >
> > - create a new per-stack value (and potentially its per-cpu "shadow") called
> stack_offset = 0
> > - periodically issue an interrupt, and inside it walk the process tree and
> >   update stack_offset randomly for each process
> > - when a process makes a new syscall, it subtracts stack_offset value from
> top_of_stack()
> >  and that becomes its new  top_of_stack() for that system call.
> >
> > Smth like this?
> 
> I'm proposing somthing that is conceptually different. 

OK, looks like I fully misunderstand what you meant indeed.
The reason I didn’t reply to your earlier answer is that I started to look
into unwinder code & logic to get at least a slight clue on how things
can be done since I haven't looked in it almost at all before (I wasn't changing
anything with regards to it, so I didn't have to). So, I meant to come back
with a more rigid answer that just "let me study this first"...

 You are,
> conceptually, changing the location of the stack.  I'm suggesting that
> you leave the stack alone and, instead, randomize how you use the
> stack. 


So, yes, instead of having:

allocated_stack_top
random_offset
actual_stack_top
pt_regs
...
and so on

We will have smth like:

allocated_stack_top = actual_stack_top
pt_regs
random_offset
...

So, conceptually we have the same amount of randomization with 
both approaches, but it is applied very differently. 

Security-wise I will have to think more if second approach has any negative
consequences, in addition to positive ones. As a paranoid security person,
you might want to merge both approaches and randomize both places (before and
after pt_regs) with different offsets, but I guess this would be out of question, right? 

I am not that experienced with exploits , but we have been
talking now with Jann and Enrico on this, so I think it is the best they comment
directly here. I am just wondering if having pt_regs in a fixed place can
be an advantage for an attacker under any scenario... 

 In plain C, this would consist of adding roughly this snippet
> in do_syscall_64() and possibly other entry functions:
> 
> if (randomize_stack()) {
>   void *dummy = alloca(rdrand() & 0x7f8);
> 
>   /* Make sure the compiler doesn't optimize out the alloca. */
>   asm volatile ("" :: "=rm" (dummy));
> }
> 
> ... do the actual syscall work here.
> 
> This has a few problems, namely that the generated code might be awful
> and that alloca is more or less banned in the kernel.  I suppose
> alloca could be unbanned in the entry C code, but this could also be
> done fairly easily in the asm code.  You'd just need to use a register
> to store whatever is needed to put RSP back in the exit code.  

Yes, I was actually thinking now on doing it in assembly since I think
it would look smaller and more clearer in it. Just need to get details slowly
in place. 


The
> obvious way would be to use RBP, but it's plausible that using a
> different callee-saved register would make the unwinder interactions
> easier to get right.

This is what I started looking into, bear with me please, all this stuff is new
for my eyes, so I am slow...

> 
> With this approach, you don't modify any of the top_of_stack()
> functions or macros at all -- the top of stack isn't changed.

Yes, understood. 

> 
> >
> > I think it is close to what Andy has proposed
> > in his reply, but the main difference is that you propose to do this via an interrupt.
> > And the main reasoning for doing this via interrupt would be not to affect
> > syscall performance, right?
> >
> > The problem I see with interrupt approach is how often that should be done?
> > Because we don't want to end up with situation when we issue it too often, since
> > it is not going to be very light-weight operation (update all processes), and we
> > don't want it to be too rarely done that we end up with processes that execute
> many
> > syscalls with the same offset. So, we might have a situation when some processes
> >  will execute a number of syscalls with same offset and some will change their
> offset
> > more than once without even making a single syscall.
> 
> I bet that any attacker worth their salt could learn the offset by
> doing a couple of careful syscalls and looking for cache and/or TLB
> effects.  This might make the whole exercise mostly useless.  Isn't
> RDRAND supposed to be extremely fast, though?
> 
> I usually benchmark like this:
> 
> $ ./timing_test_64 10M sys_enosys
> 10000000 loops in 2.53484s = 253.48 nsec / loop
> 
> using https://git.kernel.org/pub/scm/linux/kernel/git/luto/misc-tests.git/

Thank you for the pointer! 
With everyone's suggestions I am now having much better set of tools to do
my next measurements.

Best Regards,
Elena.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.