libc-coord - Re: Re: RSEQ symbols: __rseq_size, __rseq_flags vs __rseq_feature

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0947bd71-f727-ec7d-5759-84a2183d3c69@efficios.com>
Date: Sat, 17 Sep 2022 17:25:11 +0200
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Florian Weimer <fw@...eb.enyo.de>
Cc: Chris Kennelly <ckennelly@...gle.com>, libc-coord@...ts.openwall.com,
 "carlos@...hat.com" <carlos@...hat.com>,
 libc-alpha <libc-alpha@...rceware.org>, szabolcs.nagy@....com
Subject: Re: Re: RSEQ symbols: __rseq_size, __rseq_flags vs
 __rseq_feature_size

On 2022-09-17 16:45, Florian Weimer wrote:
> * Mathieu Desnoyers:
> 
>> On 2022-09-16 23:32, Florian Weimer wrote:
>>> * Chris Kennelly:
>>>
>>>>> If the kernel does not currently overwrite the padding, glibc can do
>>>>> its own per-thread initialization there to support its malloc
>>>>> implementation (because the padding is undefined today from an
>>>>> application perspective).  That is, we would initialize these
>>>>> invisible vCPU IDs the same way we assign arenas today.  That would
>>>>> cover this specific malloc use case only, of course.
>>>
>>>> If a user program updates to a new kernel before glibc does, would it be
>>>> able to easily take advantage of it?
>>>
>>> No, as far as I understand it, there is presently no signaling from
>>> kernel to applications that bypasses the rseq area registration.  So
>>> the only thing you could do is to unregister and re-register with a
>>> compatible value.  And that is of course quite undefined and assumes
>>> that you can do this early enough during the life-time of each thread.
>>>
>>> But if we have the extension handshake, I'll expect us to backport it
>>> quite widely, after some time to verify that it works with CRIU etc.
>>
>> I don't think this is what Chris is asking here.
>>
>> I think the requirement here is to make sure that the extensibility
>> scheme we come up with will allow to extend struct rseq simply by
>> upgrading the kernel, without any need to upgrade glibc. (that's indeed
>> a requirement of mine). So a new application and a new kernel can use a
>> newly available extended field, even with an old glibc.
> 
> I took it for granted that we'd like to get libc out of the picture
> for future changes, so I interpreted Chris' question in the context of
> the initial switch (i.e., enabling rseq features that need
> extensibility on currently released glibc, without upgrading glibc).

I think it makes sense to require that glibc needs to be upgraded to a 
new version before applications can use the feature fields beyond the 
initial 32 bytes. What I care about is that after we have an extensible 
scheme in glibc, then there is no need to upgrade glibc afterwards when 
additional features appear.

> 
>> If we want to keep the kernel ABI as simple as we can, then we just
>> expose the rseq feature size (and required alignment), and don't expose
>> any rseq feature flags. This in turn means that glibc would have to
>> somehow expose the rseq feature size in its ABI. If glibc decides
>> against exposing this rseq feature size symbol, then it would be up to
>> the application to combine information about __rseq_size and
>> getauxval(rseq feature size) to figure out which fields are actually
>> populated. It would "work", but chances are that some users will get it
>> wrong. It seems simpler for a user to simply do:
>>
>> if (__rseq_feature_size >= offsetofend(struct rseq, vm_vcpu_id))
>>
>> to validate whether a given field is indeed populated.
>>
>> The rseq feature size approach would scale to very large feature
>> numbers. It would *not* allow deprecation of fields after they are
>> published, but I see this as a gain in simplicity for users of the ABI,
>> even though we lose a knob as kernel developers.
> 
> I think glibc can register rseq with a new flag once it sees
> AT_RSEQ_FEATURE_SIZE in the auxiliary vector (even if it's 32).

I understand that you propose adding a rseq registration flag to be 
passed to the rseq system call when glibc supports extended rseq. I 
wonder whether it's useful at all. We implicitly know the relevant 
information through the rseq_len parameter of the rseq syscall.

If rseq_len==32, this means glibc allocated a 32-byte struct rseq area 
(either because it was the original structure size, or because 
getauxval() actually reported that at the supported feature size is <= 
32). The kernel is therefore free to use all fields below 32 bytes. This 
basically mean that we get 3 4-byte word fields available for user-space 
use right away without a glibc upgrade. The application just needs to 
use getauxval() to know whether the kernel populates those fields or not.

If rseq_len > 32, this means glibc used the getauxval() information to 
know the area size. Then the kernel can validate that the size is large 
enough to contain all features it supports upon rseq registration.

That
> flag would naturally end up in __rseq_flags.

I'm still unsure that we need to pass a flag to the kernel at all.

   For future extensions
> __rseq_size should work directly.

So for extensions in the last 3 4-byte words of struct rseq (currently 
padding), applications would have to check both __rseq_flags and compare 
the offset of the end of the field with __rseq_size.

Then for fields beyond the 32 first bytes, just checking the __rseq_size 
would suffice. However, we should be careful that the semantic of 
__rseq_size should *not* include padding anymore, but rather end exactly 
after the last supported feature field.

> 
> But as I said, we better use all the padding at once during the first
> step.  Or we could add even more stuff to move past the current 32,
> then we wouldn't need the flag dance. 8-)

Adding semi-useful information in those words means we put a hard 
requirement on users to upgrade their glibc before they can access the 
more important feature fields we would rather like to make available first.

So with your proposal, the application-level users would be expected to 
do something like this:

static uint32_t local_rseq_feature_size;

if (__rseq_flags & RSEQ_FLAG_EXTENDED)
	local_rseq_feature_size = __rseq_size;
else
	local_rseq_feature_size = 20; /* after last orig. field */

and then use:

if (local_rseq_feature_size >= offsetofend(struct rseq, field))

as check for feature availability. I just find it more likely that users 
may get it wrong compared to having the rseq feature size already 
available in a new libc __rseq_feature_size symbol. On the upside there 
is then no need to export an additional symbol.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.