|
|
Message-ID: <20260613121036.GM3520958@port70.net>
Date: Sat, 13 Jun 2026 14:10:36 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: using builtins within musl
* Rich Felker <dalias@...c.org> [2026-06-12 22:10:00 -0400]:
> On Sat, Jun 13, 2026 at 03:18:42AM +0200, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@...c.org> [2026-06-12 19:11:12 -0400]:
> > > On Fri, Jun 12, 2026 at 11:43:41PM +0200, Szabolcs Nagy wrote:
> > > > builtins supposed to improve code generation and most useful when a
> > > > library call can be lowered to a few instructions. e.g. memcpy and
> > > > memset with a fixed small size can be a few load/store/move ops.
> > > >
> > > > unfortunately gcc creates a mess on x86_64 of code like
> > > >
> > > > if (n < 64) __builtin_memcpy(d,s,n);
> > > > if (n < 64) __builtin_memset(p,0,n);
> > > >
> > > > libc.so .text change:
> > > > arch diff size
> > > > x86_64: +4525 667424
> > > > riscv64: +724 613171
> > > > aarch64: -432 679747
> > > > arm: -152 658809
> > >
> > > Can you give a brief summary of what gcc does such a bad job of on
> > > x86_64? Does it inline something with a bunch of branching cases for
> > > different sizes or something? The results on the other archs don't
> > > look so bad.
> >
> > x86_64 inlines more, i assume it is fast, but not the
> > best for size, e.g. try on godbolt:
> >
> > void foo(char *d, char *s, long n)
> > {
> > if (n < 64) __builtin_memcpy(d,s,n);
> > }
>
> OK, I see. It looks like this behavior is controlled by the stringop
> options documented at
> https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
>
> There may be a way to tell it to stop being stupid.
>
> It looks like -mstringop-strategy=libcall suppresses all dynamic-n
> inlining but still inlines constant-n. I have no idea why that isn't
> default below -O3.
note: dynamic size inlining happens even at -O0 on gcc-16
with -mstringop-strategy=libcall the memcpy+memset patch is only +162
.text size compared to master on x86_64, however relocs change like
+1 -42 memcpy across 22 tu
+30 -11 memset across 33 tu
the new relocs are from code like
struct z {
int i;
char a[256];
} x, y;
void foo() {
x = (struct z){.i=42}; // memset
y = x; // memcpy
}
before -mstringop-strategy=libcall struct init was done via "rep stos"
and struct copy via "rep movs".
after, it is done via reg load/store for small structs and memset/cpy
for large, so it affects fixed size ops too, not just dynamic.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.