|
|
Message-ID: <20260613021000.GL27423@brightrain.aerifal.cx>
Date: Fri, 12 Jun 2026 22:10:00 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: using builtins within musl
On Sat, Jun 13, 2026 at 03:18:42AM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@...c.org> [2026-06-12 19:11:12 -0400]:
> > On Fri, Jun 12, 2026 at 11:43:41PM +0200, Szabolcs Nagy wrote:
> > > builtins supposed to improve code generation and most useful when a
> > > library call can be lowered to a few instructions. e.g. memcpy and
> > > memset with a fixed small size can be a few load/store/move ops.
> > >
> > > unfortunately gcc creates a mess on x86_64 of code like
> > >
> > > if (n < 64) __builtin_memcpy(d,s,n);
> > > if (n < 64) __builtin_memset(p,0,n);
> > >
> > > libc.so .text change:
> > > arch diff size
> > > x86_64: +4525 667424
> > > riscv64: +724 613171
> > > aarch64: -432 679747
> > > arm: -152 658809
> >
> > Can you give a brief summary of what gcc does such a bad job of on
> > x86_64? Does it inline something with a bunch of branching cases for
> > different sizes or something? The results on the other archs don't
> > look so bad.
>
> x86_64 inlines more, i assume it is fast, but not the
> best for size, e.g. try on godbolt:
>
> void foo(char *d, char *s, long n)
> {
> if (n < 64) __builtin_memcpy(d,s,n);
> }
OK, I see. It looks like this behavior is controlled by the stringop
options documented at
https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
There may be a way to tell it to stop being stupid.
It looks like -mstringop-strategy=libcall suppresses all dynamic-n
inlining but still inlines constant-n. I have no idea why that isn't
default below -O3.
Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.