Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Wed, 9 Sep 2015 12:36:01 -0400
From: Zack Weinberg <zackw@...ix.com>
To: gcc@....gnu.org, llvmdev@...uiuc.edu,
 GNU C Library <libc-alpha@...rceware.org>, musl@...ts.openwall.com
Subject: Compiler support for erasure of sensitive data

Over the past couple months I have been working on adding a function to
glibc for erasure of sensitive data after it's used. This function
already exists in various other libraries under various names --
explicit_bzero, explicit_memset, SecureZeroMemory, memset_s,
OPENSSL_cleanse, etc -- and the problem it solves is, in this code:

    void encrypt_with_phrase(const char *phrase, const char *in,
                             char *out, size_t n)
    {
        char key[16];
        genkey(phrase, key);
        encrypt(key, in, out, n);
        memset(key, 0, 16);
    }

the compiler is entitled to delete the final call to memset, on the
grounds that `key` is dead after the call, so no conforming C program
can access that memory to prove that it has been cleared. However, a
crash dump, a debugger, or a plain old bug in the program might expose
that region of the stack to prying eyes. This is listed at
https://cwe.mitre.org/data/definitions/14.html as a "common weakness".
explicit_bzero (under whatever name) is just bzero with an additional,
documented guarantee that the compiler will not optimize it out even
if the memory region is dead afterward.  To further underline that
this is a common scenario, there are 152 uses of either
OPENSSL_cleanse or explicit_bzero in libressl 2.2.3 (they appear to be
in the middle of changing the name).

During development of my patches, we encountered two problems which
cannot be fully resolved without compiler assistance, so I want to
open a conversation about getting that assistance added to GCC and
Clang.

The first, simpler problem is strictly optimization.  explicit_bzero
can be optimized to memset followed by a vacuous use of the memory
region (generating no machine instructions, but preventing the stores
from being deleted as dead); this is valuable because the sensitive
data is often small and fixed-size, so the memset can in turn be
replaced by inline code.  (This also facilitates implementation of
-D_FORTIFY_SOURCE checks for explicit_bzero.)  Again looking at
libressl, 92 of those 152 uses are improved by a crude version of this
optimization:

    void explicit_bzero(void *, size_t);
    extern inline __attribute__((gnu_inline, always_inline))
    void explicit_bzero_constn(void *ptr, size_t len)
    {
      typedef struct {char x[len];} memblk;
      memset(ptr, 0, len);
      asm("" : : "m" (*(memblk __attribute__((may_alias)) *)ptr));
    }
    #define explicit_bzero(s, n)                          \
      (__extension__(__builtin_constant_p(n) && (n) > 0   \
                     ? explicit_bzero_constn(s, n)        \
                     : explicit_bzero(s, n)))

I call this "crude" because it only works in GCC, when compiling C,
and when the length parameter is compile-time constant.  GCC issues no
error for this code when 'len' is not compile-time constant, but it is
not documented to work reliably.  When compiling C++, GCC does not
accept a structure containing an array whose size is not *lexically*
constant; even if the body of explicit_bzero_constn is moved into the
macro so that the whole thing is guarded by __builtin_constant_p,
using explicit_bzero with a non-constant size will cause a compile
error.  The same is true for Clang whether compiling C or C++.

This problem could be solved with a very simple feature addition:

    extern inline __attribute__((gnu_inline, always_inline))
    void explicit_bzero(void *ptr, size_t len)
    {
      memset(ptr, 0, len);
      __builtin_use_memory(ptr, len);
    }

where __builtin_use_memory compiles to no machine instructions, but is
treated as a non-deletable use of all LEN bytes starting at PTR,
whether or not LEN is constant.

The harder problem, unfortunately, is one of correctness.  Consider
now this example (courtesy Florian Weimer):

    extern void explicit_bzero(void *, size_t);

    struct key { unsigned long long hi, lo; };
    extern struct key get_key(void);
    extern void use_key(struct key);

    void without_clear(void)
    {
        struct key k = get_key();
        use_key(k);
    }

    void with_clear(void)
    {
        struct key k = get_key();
        use_key(k);
        explicit_bzero(&k, sizeof k);
    }

On AMD64 this will be compiled to something like

    without_clear:
        subq    $8, %rsp
        call    get_key
        movq    %rdx, %rsi
        movq    %rax, %rdi
        call    use_key
        addq    $8, %rsp
        ret

   with_clear:
        subq    $24, %rsp
        call    get_key
        movq    %rax, %rdi
        movq    %rdx, %rsi
        movq    %rax, (%rsp)
        movq    %rdx, 8(%rsp)
        call    use_key
        movq    %rsp, %rdi
        movl    $16, %esi
        call    explicit_bzero
        addq    $24, %rsp
        ret

The ABI dictates basically everything you see.  The call to
explicit_bzero has forced the compiler to *create* a second copy of
the variable `k` on the stack, just so it can be erased -- and the
copy in registers survives (at least for a short time), which is not
what the programmer wanted.  With or without explicit_bzero, we have
no way of getting rid of the copy in registers.  More complicated
scenarios of course exist.

I think the ideal feature addition to address this would be

    void safe(void)
    {
        struct key __attribute__((sensitive)) k = get_key();
        use_key(k);
    }

which should generate

    safe:
        subq    $8, %rsp
        call    get_key
        movq    %rdx, %rsi
        movq    %rax, %rdi
        call    use_key
        xorq    %rdx, %rdx
        xorq    %rax, %rax
        xorq    %rsi, %rsi
        xorq    %rdi, %rdi
        addq    $8, %rsp
        ret

(use_key *could have* clobbered the values in all four of those
registers, but it is not required to.  If use_key gets inlined, of
course the compiler could track what registers actually contain
sensitive values.)

We would have to nail down precise semantics for __attribute__((sensitive)).
It needs to propagate to invisible copies made by the compiler, and to
*partial* copies in register or stack slots.  Doing a full-on taint
propagation, however, might be too much.  (In the original example,
one would *not* want the compiler to erase `out`, even though what
gets put into `out` is data-dependent on the value of `key`.)
It also needs to work on heap allocations as well as declared variables.
And there are uses in both glibc (crypt(3)) and libressl where the
sensitive datum is one field of a larger struct, other fields of which
must *not* be erased.

For compatibility with existing code that uses explicit_bzero or
similar, a second attribute would be desirable:

    extern void explicit_bzero(void *, size_t)
        __attribute__((erase_sensitive(1, 2)));

informs the compiler that this function performs an erasure of arg#2
bytes of sensitive data pointed to by arg#1.  This has the effect of
back-propagating __attribute__((sensitive)) to the object pointed-to
by argument #1.  The compiler _is_ then allowed to omit a call to the
function provided that its own erasures are at least as thorough.  For
heap allocations (which don't go out of scope) the call to this
function marks the point where erasure of copies should occur.

(I'm proposing an attribute rather than a builtin function because of
the many different names used for this operation by existing code.)

Comments?  Please note that I do not have anything like the time
required to implement any of this myself (and I'm ten years out of
practice on GCC and have no experience whatsoever with Clang,
anyway).  I'm hoping this catches someone's interest.

zw

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.