Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 25 May 2023 09:07:23 -0400
From: Rich Felker <dalias@...c.org>
To: Jₑₙₛ Gustedt <jens.gustedt@...ia.fr>
Cc: musl@...ts.openwall.com
Subject: Re: [C23 new stdlib 2/4] C23: add the memalignment function

On Thu, May 25, 2023 at 11:16:53AM +0200, Jₑₙₛ Gustedt wrote:
> Rich,
> 
> on Wed, 24 May 2023 17:28:35 -0400 you (Rich Felker <dalias@...c.org>)
> wrote:
> 
> > There's a more efficient implementation of this function which does
> > not depend on __builtin_ctzll (which we don't): p^(p-1)&p
> 
> yes, nice
> 
> If you have other such nice formulas for some of the interfaces in
> <stdbit.h>, please let me know. They are hard to find in search
> engines.
> 
> For `__builtin_ctzll`, hm, I counted also to use these for the
> <stdbit.h> interfaces. I think the intent of those is really to
> provide instruction level support of these bit functionalities that
> avoids going through function calls.
> 
> I see that we use other quite specific builtins, such as
> `__builtin_fma`. Is there a particular reason not to have the bit
> level ones?

That was probably an oversight. The intent of those is to expand to an
fma instruction, but AIUI there's no reason the compiler isn't allowed
to expand it to a call to the fma() function (like the other
__builtin_* versions of standard functions). That's not really a
problem here, but it *could be* in other cases where the definition
may end up being circular.

Really these should be cleaned up to have src/include/*.h recover some
of the builtins that -ffreestanding had to suppress (to prevent
circularity) where they're useful, conditional on probing for compiler
support, rather than using them directly in the code. Then log, pow,
and log2 could just call fma(). More importantly, a lot of
memcpy/memset use could be inlined where it can't now.

> So you want me to use the ctz and clz interfaces from the internal
> atomics for the implementation of <stdbit.h>? Are they not overkill
> for this simple purpose? (I mean they are meant to be atomic, arent't
> they?)

No, I think you're mixing up code which is part of musl and code which
is (included as) part of the application. atomic.h is a musl
implementation detail and is not even present for application code
included from stdbit.h to use.

> > I'm not clear that there's a really good motivation for having it
> > implemented in the header. It kinda falls under the "maybe do it
> > because it's easy" case, but once you switch to the non-GNUC-specific
> > version, it needs a static inline function to avoid multiple
> > evaluation,
> 
> I guess using `({ ... })` would also not be acceptable?

It's presently outside the limited set of GNU C we use, and I'd prefer
to keep it that way.

> > and then it gets to the point of "this is really too much
> > code to have in a header" (which is more an issue of avoiding
> > "creative" content in headers than anything else).
> 
> You probably won't like what I have now for the bit operations, then
> :-(

See NRK's follow-up. If pure header-only implementations of these
interfaces met the requirements of the standard, I could see a good
argument for having implementations there. But if they require
external functions, I'm not clear on why it makes sense to implement
inline versions in the headers rather than just letting the compiler
do its own transformations to inline if/when it wants. This was a path
glibc went down a long time ago with "bits/string2.h", putting gobs of
attempted-optimized code in public headers that was suppressing gcc's
builtin, much better optimizations for the same functions via its
__builtin_*. I don't think we should repeat that.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.