Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 16 Sep 2021 16:32:28 -0400
From: Chris Kennelly <ckennelly@...gle.com>
To: Noah Goldstein <goldstein.w.n@...il.com>
Cc: libc-coord@...ts.openwall.com, gcc@....gnu.org, 
	GNU C Library <libc-alpha@...rceware.org>
Subject: Re: Add new ABI '__memcmpeq()' to libc

On Thu, Sep 16, 2021 at 2:31 PM Noah Goldstein <goldstein.w.n@...il.com>
wrote:

>
>
> On Thu, Sep 16, 2021 at 12:55 PM Chris Kennelly via Libc-alpha <
> libc-alpha@...rceware.org> wrote:
>
>> On Thu, Sep 16, 2021 at 1:04 PM Noah Goldstein <goldstein.w.n@...il.com>
>> wrote:
>>
>> > Hi All,
>> >
>> > This is a proposal for a new interface to be supported by libc.
>> >
>> > The new interface is the same as the old 'bcmp()' routine. Essentially
>> > the goal of this proposal is to add a reserved namespace for a new
>> > function, '__memcmpeq()', which shares the same behavior as the old
>> > 'bcmp()'.
>> >
>> > #### Interface ####
>> >
>> > int __memcmpeq(void const * s1, const void * s2, size_t n)
>> >
>> >
>> > #### Description ####
>> >
>> > The '__memcmpeq()' function would compare the two byte sequences 's1'
>> > and 's2', each of length 'n'. If the two byte sequences are equal, the
>> > return would be zero. Otherwise it would return some non-zero
>> > value. 'memcmp()' is a valid implementation of '__memcmpeq()'.
>> >
>> >
>> > #### Use Case ####
>> >
>> > 1. The goal is that '__memcmpeq()' will be usable as an optimization
>> >    by compilers if a program uses the return value of 'memcmp()' as a
>> >    boolean. For example:
>> >
>> >
>> > void foo(const void* s1, const void* s2, size_t n)
>> > {
>> >     if (!memcmp(s1, s2, n)) {
>> >         printf("memcmp can be optimized to __memcmpeq in this use
>> case\n");
>> >     }
>> > }
>> >
>> >
>> > - In the above case '__memcmpeq()' could be used instead. Due to the
>> >   simpler constraints on the return value of '__memcmpeq()', it will
>> >   be able to be implemented more optimally for this case than
>> >   'memcmp()'. If there is no separately optimized version of
>> >   '__memcmpeq()' can alias 'memcmp()' and thus be at least equally as
>> >   fast.
>> >
>>
>> LLVM does this transformation (but to bcmp), as part of
>> https://reviews.llvm.org/rG8e16d73346f8091461319a7dfc4ddd18eedcff13.  I
>> seem to recall a small amount of trickiness around determining whether the
>> platform had a bcmp.
>>
>> Since this is intentionally the same as bcmp, is it possible to clarify
>> the
>> motivation for additional symbol?
>>
>
> The motivation is to get a new reserved namespace for a function that
> memcmp() calls can be transformed to if the return value is only used
> for its boolean value.
>
> I tried to add an optimized version of bcmp() to support LLVM's
> transformation: https://patches-gcc.linaro.org/patch/60168/
> But the consensus seems to be that bcmp() is not suitable because 1)
> it is not a reserved namespace and 2) since it has had the same
> functionality as memcmp() programs might have started relying on that
> feature.
>

llvm-libc's bcmp differs from memcmp, but agreed that Hyrum's Law can cause
problems on point #2.

In terms of relying on the feature:  If __memcmpeq is ever exposed as an a
simple alias for memcmp (since the notes mention that it's a valid
implementation), does that open up the possibility of depending on the
bcmp-like behavior that we were trying to escape?


>
> Do you want me to update the above proposal with this information or
> were you just asking for more clarity for the thread?
>
>
>>
>>
>> > 2. Possibly use cases in security as the runtime of the function will
>> >    be *more* oblivious to the byte sequences being compared.
>> >
>> >
>> > #### Argument Specifications ####
>> >
>> > 1. 's1'
>> >     - All 'n' bytes in the byte sequence starting at 's1' and ending
>> >       at, but not including, 's1 + n' must be accessible memory. There
>> >       are no guarantees about the order the sequence will be
>> >       traversed.
>> > 2. 's2'
>> >     - All 'n' bytes in the byte sequence starting at 's2' and ending
>> >       at, but not including, 's2 + n' must be accessible memory. There
>> >       are no guarantees about the order the sequence will be
>> >       traversed.
>> > 3. 'n'
>> >     - 'n' may be any value that does not violate the specifications on
>> >       's1' and 's2'.
>> >
>> > If any of the argument specifications are violated there are no
>> > guarantees about the behavior of the interface.
>> >
>> >
>> > #### Return Value Specification ####
>> >
>> > If the byte sequences starting at 's1' and 's2' are equals the
>> > function will return zero. Otherwise the function will return a
>> > non-zero value.
>> >
>> > Equality between the byte sequences starting at 's1' and 's2' is
>> > defined as follows:
>> >
>> > 1. If 'n' is zero the two sequences are zero.
>> > 2. If 'n' is non-zero then for all 'i' in range [0, n) the byte at
>> >    offset 'i' of 's1' equals the byte at offset 'i' in 's2'.
>> >
>> > For a simple C implementation of '__memcmpeq()' could be as follows:
>> >
>> >
>> > int __memcmpeq(const void* s1, const void* s2, size_t n)
>> > {
>> >     int ret;
>> >     size_t i;
>> >     const char *s1c, *s2c;
>> >     s1c = (const char*)s1;
>> >     s2c = (const char*)s2;
>> >     for (i = 0, ret = 0; ret == 0 && i < n; ++i) {
>> >         ret = s1c[i] - s2c[i]
>> >     }
>> >     return ret;
>> > }
>> >
>> >
>> > #### Notes ####
>> >
>> > This interface is essentially old 'bcmp()' and 'memcmp()' will always
>> > be a valid implementation of '__memcmpeq()'.
>> >
>> >
>> > #### ABI vs API ####
>> >
>> > This proposal is for '__memcmpeq()' as a new ABI. As an ABI
>> > '__memcmpeq()' will have value, as using the return value of
>> > 'memcmp()' is quite idiomatic in C code.
>> >
>> > It is, however, possible that this would also be useful as a new API
>> > as well. Especially if there are likely use cases where the compiler
>> > would be unable to prove that '__memcmpeq()' would be a valid
>> > replacement for 'memcmp()'.
>> >
>> >
>> > #### Further Options ####
>> >
>> > If this proposal is received positively, libc could also add
>> > interfaces for '__streq()', '__strneq()', '__wcseq()' and '__wcsneq()'
>> > which similarly would loosen return value restrictions on 'strcmp()',
>> > 'strncmp()', 'wcscmp()' and 'wcsncmp()' respectively.
>> >
>> > Best,
>> > Noah
>> >
>>
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.