Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 16 Sep 2022 17:29:27 +0200
From: Florian Weimer <fw@...eb.enyo.de>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: "carlos@...hat.com" <carlos@...hat.com>,  libc-alpha
 <libc-alpha@...rceware.org>,  szabolcs.nagy@....com,
  libc-coord@...ts.openwall.com
Subject: Re: RSEQ symbols: __rseq_size, __rseq_flags vs __rseq_feature_size

* Mathieu Desnoyers:

> /*
>   * C) Check only rseq flags. 32 features at most. One mask and one
>   * comparison.
>   */
>
> void fC(void)
> {
>          if (likely(__rseq_flags & __RSEQ_FLAG_FEATURE_VM_VCPU_ID)) {
>                  /* Use rseq with vcpu_id. */
>                  asm volatile ("ud2\n\t");
>          } else {
>                  /* Fallback. */
>                  asm volatile ("int3\n\t");
>          }

I think it has to be this because we cannot lower __rseq_flags below
32 now, not if rseq is active.

If you don't find a better use fot the remaining 32 bits of padding,
maybe put the PID or TID there, so that we can create a
system-call-less version of getpid/gettid.  So the flag would just say
that the padding is now completely used.

Going forward, we can use the size increasing above 32 as a support
indicator.

> I can think of 4 approaches that applications will use to detect
> availability of their specific rseq feature for each rseq critical
> section:
>
> 1) Dynamically check whether the feature is implemented at runtime
>     with conditional branches. Those using this approach will probably
>     not want to have the overhead of the two comparisons in approach (A)
>     above. Applications and libraries should probably use their own copy
>     of the glibc symbols for speed purposes.
>
> 2) Implement the entire function as IFUNC and select whether a rseq or
>     non-rseq implementation should be used at C startup. The tradeoff
>     here is code size vs speed, and using IFUNC for things like malloc
>     may add additional constraints on the startup order.
>
> 3) Code rewrite (dynamic code patching) between rseq and non-rseq code.
>     This may be frowned upon in the security area and may not always be
>     possible depending on the context.
>
> 3) JIT compilation of specialized rseq vs non-rseq code. Not generally
>     available in C.
>
> I suspect that glibc may rely on approaches 1+2 depending on the
> situation, and many applications may use approach (1) for simplicity
> reasons.

If the kernel does not currently overwrite the padding, glibc can do
its own per-thread initialization there to support its malloc
implementation (because the padding is undefined today from an
application perspective).  That is, we would initialize these
invisible vCPU IDs the same way we assign arenas today.  That would
cover this specific malloc use case only, of course.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.