musl - Re: [PATCH] Make qsort 50% faster

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKHv7pj=HvDsPpMT-JQ7Q0PzD7SQkRP7qS=tPMSe2AcMa9BKaQ@mail.gmail.com>
Date: Fri, 22 May 2026 12:15:19 +0200
From: Paul Schutte <sjpschutte@...il.com>
To: musl@...ts.openwall.com, Justine Tunney <jtunney@...il.com>
Subject: Re: [PATCH] Make qsort 50% faster

Hi all

I apologize if I am going slightly off topic here, but this discussion
reminds me of some observations I made regarding memcpy
visibility/inlining.

On x86_64 (Void Linux), I observed the following with one of my
hashmap benchmarks.

When linked against musl:

./obj/last/Ember_test2 ~/enwiki-20230720-all-titles-in-ns0.shuf
Insertion took: 3.132s
Failed lookups: 0
Lookups took: 2.589s

When linked against glibc:

./obj/last/Ember_test2 ~/enwiki-20230720-all-titles-in-ns0.shuf
Insertion took: 2.345s
Failed lookups: 0
Lookups took: 2.593s

I then replaced libc memcpy with this very naive implementation while
still linking against musl:

sub memcpy -> Pointer:Byte
       args
               d Pointer:Byte cp
               s Pointer:Byte cp
               len Word cp

       for (svar Word i:=0; i < len; inc i)
               d:(i) := s:(i)
       return d

which resulted in:

./obj/last/Ember_test2 ~/enwiki-20230720-all-titles-in-ns0.shuf
Insertion took: 2.349s
Failed lookups: 0
Lookups took: 2.570s

This makes me wonder whether visibility/inlining opportunities around
memcpy may play a larger role than expected in some workloads,
especially for smaller or compiler-visible copy sizes.

Kind Regards
Paul


On Fri, May 22, 2026 at 11:07 AM Szabolcs Nagy <nsz@...t70.net> wrote:
>
> * Justine Tunney <jtunney@...il.com> [2026-05-21 19:29:36 -0700]:
> > This change makes musl qsort go 50% faster on common sizes by avoiding
> > memcpy calls. On x86-64 with cc -Os it can be as much as 2x faster. No
> > changes have been made to the smoothsort algorithm itself. This simple
> > patch brings musl much closer to other C libraries in terms of sorting
> > perf, while maintaining smoothsort's highly desirable properties, e.g.
> > small size, stack safety, signal safety, no malloc dependency, etc.
> >
> > The cost is +670 bytes of .text (on x86-64) to specialize cycle() for
> > widths 4, 8, 12, 16, 20, 24, 28, and 32 (gcc, without stack protector).
> > But if you want to be like OpenBSD's qsort and only specialize 4 and 8,
> > then you're looking at a +140 byte increase. I chose the upper bound of
> > what's practical; after 32, gcc stops inlining memcpy cleanly.
> >
> > Further analysis and benchmark numbers are available in my repo:
> > https://github.com/jart/musl-qsort-speedup
>
> i think upstream musl would need
> __builtin_memcpy for this to work.
>
> > +/* specializes cycle() for constant width */
> > +#define DECLARE_CYCLE(NAME, WIDTH)                                   \
> > +     static void NAME(size_t width, unsigned char* ar[], long n)     \
> > +     {                                                               \
> > +             long i;                                                 \
> > +             char t[WIDTH];                                          \
> > +             if(n < 2)                                               \
> > +                     return;                                         \
> > +             memcpy(t, ar[0], WIDTH);                                \
> > +             for (i = 0; i < n - 1; i++)                             \
> > +                     memcpy(ar[i], ar[i + 1], WIDTH);                \
> > +             memcpy(ar[n - 1], t, WIDTH);                            \
> > +     }
> > +
> > +DECLARE_CYCLE(cycle4, 4)
> > +DECLARE_CYCLE(cycle8, 8)
> > +DECLARE_CYCLE(cycle12, 12)
> > +DECLARE_CYCLE(cycle16, 16)
> > +DECLARE_CYCLE(cycle20, 20)
> > +DECLARE_CYCLE(cycle24, 24)
> > +DECLARE_CYCLE(cycle28, 28)
> > +DECLARE_CYCLE(cycle32, 32)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.