Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260613121900.GN3520958@port70.net>
Date: Sat, 13 Jun 2026 14:19:00 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: Rich Felker <dalias@...c.org>, musl@...ts.openwall.com
Subject: Re: using builtins within musl

* Szabolcs Nagy <nsz@...t70.net> [2026-06-13 14:10:36 +0200]:
> * Rich Felker <dalias@...c.org> [2026-06-12 22:10:00 -0400]:
> > On Sat, Jun 13, 2026 at 03:18:42AM +0200, Szabolcs Nagy wrote:
> > > * Rich Felker <dalias@...c.org> [2026-06-12 19:11:12 -0400]:
> > > > On Fri, Jun 12, 2026 at 11:43:41PM +0200, Szabolcs Nagy wrote:
> > > > > builtins supposed to improve code generation and most useful when a
> > > > > library call can be lowered to a few instructions. e.g. memcpy and
> > > > > memset with a fixed small size can be a few load/store/move ops.
> > > > > 
> > > > > unfortunately gcc creates a mess on x86_64 of code like
> > > > > 
> > > > >  if (n < 64) __builtin_memcpy(d,s,n);
> > > > >  if (n < 64) __builtin_memset(p,0,n);
> > > > > 
> > > > > libc.so .text change:
> > > > >     arch   diff   size
> > > > >   x86_64: +4525 667424
> > > > >  riscv64:  +724 613171
> > > > >  aarch64:  -432 679747
> > > > >      arm:  -152 658809
> > > > 
> > > > Can you give a brief summary of what gcc does such a bad job of on
> > > > x86_64? Does it inline something with a bunch of branching cases for
> > > > different sizes or something? The results on the other archs don't
> > > > look so bad.
> > > 
> > > x86_64 inlines more, i assume it is fast, but not the
> > > best for size, e.g. try on godbolt:
> > > 
> > > void foo(char *d, char *s, long n)
> > > {
> > >     if (n < 64) __builtin_memcpy(d,s,n);
> > > }
> > 
> > OK, I see. It looks like this behavior is controlled by the stringop
> > options documented at
> > https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
> > 
> > There may be a way to tell it to stop being stupid.
> > 
> > It looks like -mstringop-strategy=libcall suppresses all dynamic-n
> > inlining but still inlines constant-n. I have no idea why that isn't
> > default below -O3.
> 
> note: dynamic size inlining happens even at -O0 on gcc-16
> 
> with -mstringop-strategy=libcall the memcpy+memset patch is only +162
> .text size compared to master on x86_64, however relocs change like
> 
>   +1 -42 memcpy across 22 tu
>  +30 -11 memset across 33 tu
> 
> the new relocs are from code like
> 
>  struct z {
>    int i;
>    char a[256];
>  } x, y;
>  void foo() {
>    x = (struct z){.i=42}; // memset
>    y = x; // memcpy
>  }
> 
> before -mstringop-strategy=libcall struct init was done via "rep stos"
> and struct copy via "rep movs".

it seems this has changed in gcc-16, now struct init/copy
is identical to builtin memset/memcpy calls, so the flag
does not change things.

> 
> after, it is done via reg load/store for small structs and memset/cpy
> for large, so it affects fixed size ops too, not just dynamic.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.