musl - Re: [PATCH v2] ldso: skip gnu hash calculation if precomputed

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251208231404.GS3520958@port70.net>
Date: Tue, 9 Dec 2025 00:14:04 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: Rich Felker <dalias@...c.org>
Cc: Fabian Rast <fabian.rast@....de>, musl@...ts.openwall.com
Subject: Re: [PATCH v2] ldso: skip gnu hash calculation if precomputed

* Rich Felker <dalias@...c.org> [2025-12-08 11:20:42 -0500]:
> On Mon, Dec 08, 2025 at 06:20:58AM +0100, Szabolcs Nagy wrote:
> > if this is right and the hash is good, then we have
> > about 20-30% chance to fail and 70-80% to succeed
> > just with that one bit.

empirically the two bit bloom filter has 4-8%
failure (0.2-0.3 squared) so this looks right.

> > it seems to me that with many dsos (>50), it's not
> > the gnu_hash but the walk over all dsos and checking
> > gnu_lookup_filtered will be the slow part. not
> > sure if it's possible to help with that.
> > i also noticed that some symbols are looked up a lot:
> > in clang >30% of lookups are (multiple relocs per dso)
> > __cxa_pure_virtual
> > _ZTVN10__cxxabiv120__si_class_type_infoE
> > _ZTVN10__cxxabiv117__class_type_infoE
> > _ZTVN10__cxxabiv121__vmi_class_type_infoE
> > and these generate fair bit of lookups too (undef weak):
> > _ITM_deregisterTMCloneTable
> > _ITM_registerTMCloneTable
> > __deregister_frame_info
> > __register_frame_info
> 
> I think modern linkers sort relocations by the symbol referenced,
> which allows you to bypass the lookup if the reference is the same as
> the previous relocation. We do not take advantage of this in the
> do_relocs loop at all, but we could. That would probably give far more
> performance boost than eliding the hashing and might even make eliding
> the hashing pointless.

so there are two optimizations:
repeat: consecutive relocs referencing the same sym
local: relocs referencing a locally defined sym

in case of a "repeat" all of find_sym can be avoided.
in case of "local" the gnu_hash and the final lookup
in find_sym can be avoided (with some extra cost).

so i added instrumentation to count relocs etc:
it seems there are binaries that can benefit from a
"local" optimization like clang and others that can
benefit form "repeat" like libreoffice. the gnu hash
seems to work well: strcmp is only used once per reloc
(however in clang >60% is strcmp(p,p) because local sym)

clang:
!r !l: n  3667  7.8% slen 45.93  7.3% iter 3.70 ff 0.16 hc 3.00 cmp 0.99
 r !l: n 14255 30.4% slen 35.92 22.1% iter 3.93 ff 0.58 hc 4.48 cmp 1.00
!r  l: n 20932 44.6% slen 62.88 56.8% iter 3.25 ff 0.14 hc 3.06 cmp 1.00
 r  l: n  8087 17.2% slen 39.39 13.8% iter 2.93 ff 0.18 hc 3.16 cmp 1.00
other: n    11  0.0% slen 15.09  0.0% iter 1.00 ff 0.00 hc 0.27 cmp 11.82
total: n 46952 100.0% slen 49.31 100.0% iter 3.43 ff 0.28 hc 3.50 cmp 1.00

ffmpeg:
!r !l: n 11196 33.2% slen 15.67 17.2% iter 30.76 ff 1.76 hc 2.96 cmp 0.98
 r !l: n  2029  6.0% slen 29.47  5.8% iter 53.47 ff 1.56 hc 4.92 cmp 1.00
!r  l: n 15803 46.9% slen 38.81 60.0% iter 63.74 ff 3.65 hc 4.64 cmp 1.00
 r  l: n  4580 13.6% slen 37.49 16.8% iter 50.79 ff 2.76 hc 4.55 cmp 1.00
other: n   115  0.3% slen 16.82  0.2% iter 1.00 ff 0.03 hc 0.04 cmp 110.18
total: n 33723 100.0% slen 30.31 100.0% iter 50.20 ff 2.77 hc 4.07 cmp 1.37

libxul:
!r !l: n 13989 49.0% slen 18.18 35.9% iter 28.27 ff 1.31 hc 2.98 cmp 0.97
 r !l: n   586  2.1% slen 32.81  2.7% iter 35.00 ff 2.56 hc 5.00 cmp 1.00
!r  l: n 11582 40.6% slen 30.34 49.6% iter 44.23 ff 2.20 hc 4.21 cmp 1.00
 r  l: n  2299  8.1% slen 35.53 11.5% iter 41.33 ff 2.09 hc 4.51 cmp 1.00
other: n    81  0.3% slen 16.74  0.2% iter 1.00 ff 0.06 hc 0.09 cmp 99.19
total: n 28537 100.0% slen 24.81 100.0% iter 35.86 ff 1.76 hc 3.64 cmp 1.27

libreoffice
!r !l: n 37999 27.6% slen 36.07 25.1% iter 26.67 ff 1.38 hc 3.44 cmp 0.99
 r !l: n 70778 51.4% slen 45.60 59.2% iter 18.03 ff 0.75 hc 3.57 cmp 1.00
!r  l: n 21183 15.4% slen 29.79 11.6% iter 51.49 ff 2.82 hc 4.98 cmp 1.00
 r  l: n  7747  5.6% slen 28.33  4.0% iter 28.85 ff 1.47 hc 4.16 cmp 1.00
other: n   127  0.1% slen 16.83  0.0% iter 1.00 ff 0.07 hc 0.08 cmp 172.51
total: n 137834 100.0% slen 39.55 100.0% iter 26.15 ff 1.28 hc 3.78 cmp 1.16

nodejs
!r !l: n  3677 29.8% slen 24.00 19.2% iter 13.52 ff 0.81 hc 3.27 cmp 0.99
 r !l: n  1974 16.0% slen 28.21 12.1% iter 14.70 ff 0.91 hc 6.07 cmp 1.00
!r  l: n  4583 37.1% slen 50.91 50.7% iter 11.83 ff 0.66 hc 3.15 cmp 1.00
 r  l: n  2102 17.0% slen 39.39 18.0% iter 13.57 ff 0.87 hc 4.65 cmp 1.00
other: n    19  0.2% slen 15.89  0.1% iter 1.00 ff 0.05 hc 0.21 cmp 23.26
total: n 12355 100.0% slen 37.26 100.0% iter 13.07 ff 0.78 hc 3.90 cmp 1.03

legend:
r:repeat relocs, !r:non-repeat, l:local def, !l:non-local def
other: not the normal do_relocs find_sym call (lfs64 etc)
n: reloc count of the given type and total%
slen: avg strlen and total%
iter: avg find_sym iter (dso visits)
ff: avg bloom filter fail count (final success excluded)
hc: avg hashchain iter
cmp: avg strcmp count
avg is sum/n of the given kind of reloc

instrumentation hack is attached for reference.

View attachment "0001-reloc-sym-lookup-stats.patch" of type "text/x-diff" (5250 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.