Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260522110755.GI3520958@port70.net>
Date: Fri, 22 May 2026 13:07:55 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: Paul Schutte <sjpschutte@...il.com>
Cc: musl@...ts.openwall.com, Justine Tunney <jtunney@...il.com>
Subject: Re: [PATCH] Make qsort 50% faster

* Paul Schutte <sjpschutte@...il.com> [2026-05-22 12:15:19 +0200]:
> I apologize if I am going slightly off topic here, but this discussion
> reminds me of some observations I made regarding memcpy
> visibility/inlining.
> 
> On x86_64 (Void Linux), I observed the following with one of my
> hashmap benchmarks.
> 
> When linked against musl:
> 
> ./obj/last/Ember_test2 ~/enwiki-20230720-all-titles-in-ns0.shuf
> Insertion took: 3.132s
> Failed lookups: 0
> Lookups took: 2.589s
> 
> When linked against glibc:
> 
> ./obj/last/Ember_test2 ~/enwiki-20230720-all-titles-in-ns0.shuf
> Insertion took: 2.345s
> Failed lookups: 0
> Lookups took: 2.593s
> 
> I then replaced libc memcpy with this very naive implementation while
> still linking against musl:
> 
> sub memcpy -> Pointer:Byte
>        args
>                d Pointer:Byte cp
>                s Pointer:Byte cp
>                len Word cp
> 
>        for (svar Word i:=0; i < len; inc i)
>                d:(i) := s:(i)
>        return d
> 
> which resulted in:
> 
> ./obj/last/Ember_test2 ~/enwiki-20230720-all-titles-in-ns0.shuf
> Insertion took: 2.349s
> Failed lookups: 0
> Lookups took: 2.570s
> 
> This makes me wonder whether visibility/inlining opportunities around
> memcpy may play a larger role than expected in some workloads,
> especially for smaller or compiler-visible copy sizes.

the qsort memcpy issue is about using memcpy when builtins
are disabled (musl is compiled as freestanding code)

your issue is that the compiler treats memcpy call different
from the naive loop with whatever cflags you are using.
(glibc has per uarch optimizations while musl is essentially
rep movsb, but this only matters if the compiler actually
emits memcpy calls instead of inlining e.g. because size or
alignment is known)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.