Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 5 Jan 2016 12:50:40 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: string word-at-a-time and atomic.h FAQ on twitter

On Tue, Jan 05, 2016 at 05:46:41PM +0100, Szabolcs Nagy wrote:
> https://twitter.com/johnregehr/status/684126374966198281
> 
> these are old faq items but since we dont have docs about internals
> i try to address them based on my understanding:

An "internals FAQ" would be a welcome addition to the wiki. I think
your text is mostly ok, but I'm including some thoughts below on
improving/fixing these issues and the possible directions we might
take.

> 1) musl strlen oob access:
> 
> the os manages memory with page granularity, this is an internal
> detail that the user should not rely on in c, but the libc can,
> (otherwise malloc/free and the dynamic linker could not be
> implemented in c) so oob accesses in word-at-a-time string
> algorithms do not cause segfault on the os level.
> 
> in theory the compiler is part of the implementation so it can
> treat libc code specially, but in practice libc code is normal
> freestanding c code. this means that the compiler can treat oob
> access arbitrarily following the abstract c semantics if it can
> see through the implementation with lto.  however i find lto a
> weak excuse to rewrite strlen in asm for all targets since lto
> of libc is still not practical and asm implementations
> historically had a lot of target specific bugs in other libcs.
> i think compiler attributes should be used here on compilers that
> might break the code, but there is no attribute for this kind of
> oob access yet (although may_alias attribute is missing here too
> and should be added like in other string functions).

I agree that we should use compiler attributes and fall back to an
utterly naive implementation if __GNUC__ (or other indication that
such attributes are supported) is not defined. Clearly we should be
using may_alias for the word-at-a-time access, but as you've said
there doesn't seem to be any attribute to safely access past the end
of the object.

In the particular case of strlen, the naive unrolled strlen with no
OOB access is actually optimal on most or all 32-bit archs, better
than what we have now. I suspect the same is true for strchr and other
related functions. So we could just consider trying to drop the OOB
accesses. Do we have a list of affected functions? That might be nice
to include.

> this takes care of oob access, but the bytes outside the passed
> object might change concurrently i.e. strlen might introduce a
> data race: again this is a problem on the abstract c language
> level that may be solved e.g. by making all accesses to those
> bytes relaxed atomic, but user code is not under libc control.
> in practice the code works if HASZERO reads the word once so it
> does arithmetics with a consistent value (because the memory
> model of the underlying machine does not treat such race
> undefined and it does not propagate unspecified value bits nor
> has trap representations).

Indeed, this seems like less of a practical concern. I think we could
make it a non-concern via volatile accesses (the practical concern
with data races like this is transformations that convert one read to
multiple reads and expect to see the same value each time) but that
would preclude most optimization by the compiler.

> we do not try to enforce these behaviours on the c level yet
> (only a very narrow set of string functions are affected which
> are also very performance critical), but fortunately those who
> are worried that the code is not correct can always generate asm
> and compile that into the libc. (and then one can verify that
> indeed the generated code is completely correct on the asm level.
> maybe musl will add generated asm to the repo, but there are other
> pending cleanup works related to asm vs c level semantics and
> these should be considered together.)

I would rather not add generated asm; not only is it ugly and
potentially full of directives we may not want (like cfi and stuff
that may not work on some as versions or alternate assemblers) but
also it can't be optimized for the specific target (e.g. modern x86
with sse instead of base i386, armv7 instead of armv4t, etc.), and it
can't be inlined or otherwise optimized with LTO.

> 2) musl atomic.h sync primitives
> 
> the primitives in atomic.h are carefully designed for musl's
> pthread implementation (which seems to me far ahead of other
> implementations in terms of correctness and portability).
> 
> however they are not documented in the code (only in the git
> log) so ppl assume they understand their precise interface
> contract by guessing (which is usually wrong because the names
> are misleading).
> 
> musl does not use 64-bit atomic primitives, a_and_64 and a_or_64
> have secific uses in the malloc implementation which determine
> their semantics.

I think we could replace them with atomic bitset/bitclear. They're
only used in a couple places, mainly malloc.

> 3) a_crash
> 
> formally a_crash can be anything (only called if user invoked
> ub or underlying system broke interface contract).
> 
> in practice a_crash should be __builtin_trap (i.e. the most
> lightweight way of terminating the process and this matters for
> security which is of course not c level semantics), but builtin
> usage is mininmized in musl which makes it possible to compile
> it with several c compilers with consistent behaviour (e.g. gcc
> does not guarantee consistent behaviour for __builtin_trap
> across targets and falls back to abort if a target does not
> have appropriate target hook defined), keeping the interface
> between the compiler and libc minimal is a key design choice
> in musl.
> 
> at some point this should be cleaned up and all targets should
> have proper single instruction crash, but that's low priority
> cleanup work so on some targets this is not yet done (there are
> other pending atomic.h cleanup works).

Indeed. Actually I'd kind of prefer improving a_crash to block signals
(via inline syscall) before issuing the SIGSEGV/SIGILL to ensure that
the kernel terminates the process rather than allowing a handler to
run.

I forget -- do you have an account on the wiki to add these things? If
not we should get you one, but in the mean time someone else could add
this content.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.