musl - Re: [RFC] Support for segmentation-hardened SafeStack

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b3ca872a-da02-60e3-40dc-a3dde96d4c1c@intel.com>
Date: Mon, 26 Sep 2016 10:28:40 -0700
From: "LeMay, Michael" <michael.lemay@...el.com>
To: Rich Felker <dalias@...c.org>
Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>
Subject: Re: [RFC] Support for segmentation-hardened SafeStack



On 9/22/2016 16:42, Rich Felker wrote:
> On Thu, Sep 22, 2016 at 11:00:45PM +0000, LeMay, Michael wrote:
>> Hi,
>>
>> I submitted several patches to LLVM and Clang to harden SafeStack
>> using segmentation on x86-32 [1]. See [2] for general background on
>> SafeStack. On Linux, I have been testing my compiler changes with a
>> modified version of musl. I currently plan to submit my musl patches
>> if and when the prerequisite LLVM and Clang patches are accepted.
>> One of my LLVM patches depends on the details of my musl patches,
>> which is the main reason that I am sending this RFC now.
> My understanding is that this is a different, incompatible ABI for
> i386, i.e. code that uses safestack is not calling-compatible with
> code that doesn't, and vice versa. Is that true? This is probably the
> most significant determining factor in how we treat it.
That's a good question, but I don't know how to answer it directly.

A salient requirement is that all code that runs while restricted 
segment limits are in effect for DS and ES must use appropriate segment 
override prefixes.  This is to direct safe stack accesses to SS and thus 
avoid violating the segment limits on DS and ES.  It is still possible 
to call code that does not satisfy that requirement (I will refer to 
such functions as "standard functions", in contrast to 
"segmentation-aware functions") in the same program, but the segment 
registers would need to be reverted to contain flat segment descriptors 
before calling such code.  Otherwise, a segment limit violation would 
occur if a standard function attempted to access data on the stack using 
DS or ES.  Of course, reverting to flat segment descriptors would leave 
the safe stacks unprotected.

Another consideration is that if a standard function invokes a 
segmentation-aware function, then there is the potential for the 
standard function to pass the address of a safe stack allocation to the 
segmentation-aware function.  This is problematic, because my proposed 
compiler pass for inserting segment override prefixes assumes that 
pointers to the safe stack are not passed between functions.  The pass 
only performs intraprocedural analysis.  If an allocation's address is 
taken and passed to a subroutine, then the segmentation-hardened 
SafeStack pass will move that allocation to the unsafe stack.

My proposed musl patches currently assume that all code that runs after 
restricted segment limits are enabled is segmentation-aware.
>
>> Specifically, https://reviews.llvm.org/D19762 assumes that the
>> unsafe stack pointer is stored at offset 0x24 in the musl thread
>> control block. This would be between the pid and tsd_used variables
>> that are currently defined. I also propose storing the base address
>> of the unsafe stack at offset 0x28, but the compiler would not
>> depend on that.
> Almost none of the existing fields are public; I think the only
> exception is the stack-protector canary. IMO you should prefer
> avoiding per-libc offset variation over preserving existing offsets.
The offset 0x24 is at least consistent with Bionic, according to the 
comments in the definition of 
X86TargetLowering::getSafeStackPointerLocation in 
<LLVM>/lib/Target/X86/X86ISelLowering.cpp.  However, I can't find any 
mention of SafeStack in the Bionic header linked from those LLVM comments.
>> Here is an overview of some other changes that I plan to propose
>> with my musl patches:
>>
>> The segmentation-hardened SafeStack support would be enabled with a
>> new configuration option, "--enable-safe-stack".
>>
>> When this is enabled, many libraries routines require that both a
>> safe stack and an unsafe stack be available. I modified _start_c in
>> crt1.c to temporarily setup a small, pre-allocated unsafe stack for
> I'd have to see exactly what you mean, but my leaning is that crt1 is
> not a good place for anything new. For dynamic-linked programs,
> crt1-to-__libc_start_main is the main permanent ABI boundary and not
> something you want to have complexity that could need changing.
I revised my patches to move the temporary unsafe stack initialization 
out of crt1 and into __libc_start_main.
>> the early initialization routines to use. I also made similar
>> changes in dlstart.c. A larger unsafe stack is allocated and setup
>> later from either __libc_start_main or __dls3, depending on whether
>> static or dynamic linking is used. I split __dls3 so that it only
>> performs minimal initialization before allocating the larger unsafe
>> stack and then performing the rest of its work in a new __dls4
>> function.
>>
>> After the larger unsafe stack is allocated, I invoke the modify_ldt
>> syscall to insert a segment descriptor with a limit that is below
>> the beginning of the safe stacks. I load that segment descriptor
>> into the DS and ES segment registers to block memory accesses to DS
>> and ES from accessing the safe stacks. One purpose of my LLVM and
>> Clang patches is to insert the necessary segment override prefixes
>> to direct accesses to the appropriate segments.
> The content on these stacks is purely return addresses, spills, and
> other stuff that's only accessible to compiler-generated code, not
> data whose addresses can be taken, right?
If an allocation's address is taken and passed as an argument to a 
subroutine, then the segmentation-hardened SafeStack pass will move that 
allocation to the unsafe stack.

Of course, function arguments are only one of many possible mechanisms 
for passing information between functions.  For example, one function 
may attempt to store a safe stack pointer into a structure that gets 
passed to another function, or into a global variable.  By default, my 
compiler pass detects and blocks such memory writes.  However, there are 
instances where such writes are necessary.  For example, the va_list 
object used to support variadic arguments stores a pointer to the safe 
stack.  I implemented other compiler patches to emit the SS segment 
override prefix when accessing variadic arguments, so storing the safe 
stack pointer into the va_list object should be allowed.  The 
intraprocedural analysis attempts to detect this type of write, but it 
currently has limitations on the complexity of pointer computations that 
it can handle.  Thus, I added compiler command line options to 
selectively override this analysis for certain files and allow safe 
stack pointers to be written to memory even when the compiler cannot 
verify that they are being written to va_list objects.

Note that manipulating safe stack addresses in registers within a single 
function is no problem.  For example, traverses_stack_p in expand_heap.c 
compares a safe stack address to other addresses. Actually, 
traverses_stack_p is interesting in other ways, since it assumes  that 
libc.auxv is on the main-thread stack.  My revised patches move auxv to 
the main-thread unsafe stack.  What checks should be performed in 
traverses_stack_p when multiple types of stacks are defined?
>
>> Many instructions expect that argc, argv, the environment, and auxv
>> are accessible in the DS and ES segments. These are stored on the
>> initial stack, which is above the limit of the restricted DS and ES
>> segments. I annotated auxv with an attribute to cause the compiler
>> to emit SS segment-override prefixes when accessing auxv. I copied
>> the other data to the heap, which is accessible in DS and ES.
> Invasive arch-specific changes to unrelated code are highly frowned
> upon in musl. I think to be acceptable upstream at all the auxv would
> also have to be relocated by startup code to an address where it's
> accessible.
I revised my patches to do this.
>> I modified the pthread routines to allocate and deallocate
>> additional stacks as needed in the appropriate memory ranges. The
>> safe stacks are allocated at high addresses so that they are above
>> the limit of the modified DS and ES segments. The unsafe stack for
>> each new thread is allocated below its TLS region and thread control
>> block, which is where the stack is currently located by default.
> This likely should be hidden inside __clone rather than in
> non-arch-specific sources.
I'll send out the current revision of my patches soon to show you 
exactly what I'm proposing and get your feedback on that.
>> The Linux vDSO code may be incompatible with programs that enable
>> segmentation-hardened SafeStack. For example, it may allocate data
>> on the safe stack and then attempt to access it in DS or ES, which
>> would result in an exception due to the segment limit violation. My
>> patches prevent the vDSO from being invoked when
>> segmentation-hardened SafeStack is enabled.
> That sounds reasonable but rather unfortunate.
>
>> Finally, the i386 __clone implementation is written in assembly
>> language, so the compiler is unable to automatically add a stack
>> segment override prefix to an instruction in that routine that
>> accesses a safe stack. I added that prefix manually in the source
>> code.
>>
>> Comments appreciated.
> Hope the above are helpful.
Yes, very.  Thank you!

  - Michael
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.