|
|
Message-Id: <DF4W5AQL60YY.2N7SZII14YLAQ@tum.de> Date: Mon, 22 Dec 2025 17:37:37 +0100 From: "Fabian Rast" <fabian.rast@....de> To: "Rich Felker" <dalias@...c.org> Cc: <musl@...ts.openwall.com> Subject: Re: [PATCH v2] ldso: skip gnu hash calculation if precomputed On Tue Dec 9, 2025 at 2:27 AM CET, Rich Felker wrote: > Since the reference-only symbols aren't in the hashtable, I don't > think there's anything useful you can do with this. I am not sure I understand this correctly. I think that it is possible to use the bloom filter to infer the missing hash bit just like you proposed in the case that a lookup is performed for a symbol that is also defined by the current dso. If useful is supposed to refer to actual usefulness for musl in particular, I agree that this optimization might not be useful because I consider it a hack - the gnu hash table surely was never supposed to be used in this way. Just for completeness I implemented pretty much your initial proposal: > One way to do this would be to use the bloom filter from the dso > making the symbol reference: if only one of the two candidates passes > it, we know which candidate was correct. I would expect this to happen > most of the time, if the bloom filter is actually doing anything > useful to begin with. As people previously pointed out in the discussion, one of the two bits tested per hash is shared between the two candidate hashes h0 and h1, which means in the best case only one bit has to be checked, in the worst case two bits are checked and the hash has to be calculated anyways. Szabolcs Nagy predicted a 70%-80% success rate for this method. This holds up in some practical examples: - clang Total 48253 bloom method applicable 63.3% (30549), success rate: 70.0% (21381) total: 44.3% - gsx Total 30320 bloom method applicable 58.3% (17665), success rate: 74.0% (13068) total: 43.1% - ffmpeg Total 33351 bloom method applicable 60.3% (20111), success rate: 73.6% (14811) total: 44.4% - mpv Total 67330 bloom method applicable 58.6% (39427), success rate: 71.4% (28170) total: 41.8% - libreoffice Total 154576 bloom method applicable 27.3% (42206), success rate: 75.9% (32037) total: 20.7% - webkit2gtk Total 137514 bloom method applicable 34.6% (47560), success rate: 75.1% (35734) total: 26.0% - libxul Total 30188 bloom method applicable 44.6% (13455), success rate: 75.3% (10126) total: 33.5% The remaining benchmarks show similar results to the version with the two divides. This optimization is most effective in clang and less in the other examples. master -> precomp bloom /usr/bin/clang cycles: 32551695 (0.66) -> 26122400 (0.65) -19.75% instructions: 58840422 (0.01) -> 47172812 (0.01) -19.83% ref-cycles: 23629302 (1.22) -> 19315295 (1.23) -18.26% duration_time: 12734736 (1.24) -> 10485433 (1.23) -17.66% /usr/bin/gsx cycles: 45676684 (0.31) -> 43229878 (0.33) -5.36% instructions: 56504823 (0.01) -> 40754248 (0.02) -27.87% ref-cycles: 33672982 (0.98) -> 31472548 (0.87) -6.53% duration_time: 18184999 (1.0) -> 17036436 (0.88) -6.32% /usr/bin/ffmpeg cycles: 72039503 (0.37) -> 67931365 (0.25) -5.7% instructions: 86761463 (0.01) -> 58112029 (0.01) -33.02% ref-cycles: 49514189 (1.04) -> 46358979 (1.01) -6.37% duration_time: 26200043 (1.04) -> 24564555 (1.02) -6.24% /usr/bin/mpv cycles: 218613993 (0.24) -> 211474537 (0.13) -3.27% instructions: 265004308 (0.0) -> 161926673 (0.0) -38.9% ref-cycles: 133443129 (0.62) -> 128415393 (0.45) -3.77% duration_time: 68599776 (0.63) -> 66095647 (0.46) -3.65% /usr/lib/libreoffice/program/soffice.bin cycles: 175438767 (0.18) -> 170710689 (0.17) -2.7% instructions: 266312008 (0.0) -> 199613021 (0.0) -25.05% ref-cycles: 110087107 (0.5) -> 106251019 (0.5) -3.48% duration_time: 56843553 (0.5) -> 54922004 (0.5) -3.38% /usr/lib/libwebkit2gtk-4.1.so.0 cycles: 234214481 (0.16) -> 224880118 (0.11) -3.99% instructions: 329852233 (0.0) -> 233217195 (0.0) -29.3% ref-cycles: 120347198 (1.09) -> 135856350 (0.6) 12.89% duration_time: 62053249 (1.08) -> 69948475 (0.59) 12.72% /usr/lib/firefox/libxul.so cycles: 59222837 (0.27) -> 56319892 (0.27) -4.9% instructions: 72834785 (0.01) -> 53201233 (0.01) -26.96% ref-cycles: 40414897 (1.02) -> 38522864 (0.92) -4.68% duration_time: 21500525 (1.01) -> 20588222 (0.91) -4.24% Interestingly, there is a big regression in perfomance loading libwebkit2gtk that i cannot explain... On my cpu (AMD Ryzen AI 9 365) the version with the two divides performs a little bit better than this version: Benchmark 1 (1471 runs): env LD_LIBRARY_PATH=/home/fr/src/musl/precomp-bloom-install/lib/ /home/fr/src/musl/precomp-bloom-install/lib/ld-musl-x86_64.so.1 --list /home/fr/src/llvm-project/build/bin/clang-22 measurement mean ± σ min … max outliers delta wall_time 6.76ms ± 584us 4.69ms … 8.84ms 16 ( 1%) 0% peak_rss 12.7MB ± 101KB 12.3MB … 12.8MB 20 ( 1%) 0% cpu_cycles 9.18M ± 747K 7.64M … 12.2M 70 ( 5%) 0% instructions 14.9M ± 919 14.9M … 14.9M 48 ( 3%) 0% cache_references 469K ± 4.10K 458K … 489K 13 ( 1%) 0% cache_misses 82.9K ± 1.48K 79.8K … 90.7K 109 ( 7%) 0% branch_misses 80.8K ± 781 79.3K … 92.1K 4 ( 0%) 0% Benchmark 2 (1533 runs): env LD_LIBRARY_PATH=/home/fr/src/musl/precomp-install/lib/ /home/fr/src/musl/precomp-install/lib/ld-musl-x86_64.so.1 --list /home/fr/src/llvm-project/build/bin/clang-22 measurement mean ± σ min … max outliers delta wall_time 6.49ms ± 656us 3.93ms … 8.82ms 58 ( 4%) ⚡- 3.9% ± 0.7% peak_rss 12.7MB ± 101KB 12.3MB … 12.8MB 22 ( 1%) - 0.0% ± 0.1% cpu_cycles 8.95M ± 1.11M 7.22M … 14.3M 129 ( 8%) ⚡- 2.6% ± 0.7% instructions 14.7M ± 857 14.7M … 14.7M 46 ( 3%) ⚡- 1.2% ± 0.0% cache_references 468K ± 5.24K 455K … 493K 8 ( 1%) - 0.2% ± 0.1% cache_misses 83.0K ± 1.67K 79.4K … 89.3K 38 ( 2%) + 0.1% ± 0.1% branch_misses 76.9K ± 580 75.5K … 78.8K 7 ( 0%) ⚡- 4.9% ± 0.1% I have attached the patch for reference. Best Regards, Fabian Rast View attachment "v3-0001-ldso-skip-gnu-hash-calculation-if-precomputed.patch" of type "text/x-patch" (3484 bytes) Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.