Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260613121036.GM3520958@port70.net>
Date: Sat, 13 Jun 2026 14:10:36 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: using builtins within musl

* Rich Felker <dalias@...c.org> [2026-06-12 22:10:00 -0400]:
> On Sat, Jun 13, 2026 at 03:18:42AM +0200, Szabolcs Nagy wrote:
> > * Rich Felker <dalias@...c.org> [2026-06-12 19:11:12 -0400]:
> > > On Fri, Jun 12, 2026 at 11:43:41PM +0200, Szabolcs Nagy wrote:
> > > > builtins supposed to improve code generation and most useful when a
> > > > library call can be lowered to a few instructions. e.g. memcpy and
> > > > memset with a fixed small size can be a few load/store/move ops.
> > > > 
> > > > unfortunately gcc creates a mess on x86_64 of code like
> > > > 
> > > >  if (n < 64) __builtin_memcpy(d,s,n);
> > > >  if (n < 64) __builtin_memset(p,0,n);
> > > > 
> > > > libc.so .text change:
> > > >     arch   diff   size
> > > >   x86_64: +4525 667424
> > > >  riscv64:  +724 613171
> > > >  aarch64:  -432 679747
> > > >      arm:  -152 658809
> > > 
> > > Can you give a brief summary of what gcc does such a bad job of on
> > > x86_64? Does it inline something with a bunch of branching cases for
> > > different sizes or something? The results on the other archs don't
> > > look so bad.
> > 
> > x86_64 inlines more, i assume it is fast, but not the
> > best for size, e.g. try on godbolt:
> > 
> > void foo(char *d, char *s, long n)
> > {
> >     if (n < 64) __builtin_memcpy(d,s,n);
> > }
> 
> OK, I see. It looks like this behavior is controlled by the stringop
> options documented at
> https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
> 
> There may be a way to tell it to stop being stupid.
> 
> It looks like -mstringop-strategy=libcall suppresses all dynamic-n
> inlining but still inlines constant-n. I have no idea why that isn't
> default below -O3.

note: dynamic size inlining happens even at -O0 on gcc-16

with -mstringop-strategy=libcall the memcpy+memset patch is only +162
.text size compared to master on x86_64, however relocs change like

  +1 -42 memcpy across 22 tu
 +30 -11 memset across 33 tu

the new relocs are from code like

 struct z {
   int i;
   char a[256];
 } x, y;
 void foo() {
   x = (struct z){.i=42}; // memset
   y = x; // memcpy
 }

before -mstringop-strategy=libcall struct init was done via "rep stos"
and struct copy via "rep movs".

after, it is done via reg load/store for small structs and memset/cpy
for large, so it affects fixed size ops too, not just dynamic.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.