|
|
Message-Id: <DEQGS8OFCT9U.223N883ZZNLB@tum.de>
Date: Fri, 05 Dec 2025 18:37:57 +0100
From: "Fabian Rast" <fabian.rast@....de>
To: <musl@...ts.openwall.com>
Subject: Re: [PATCH] ldso: skip gnu hash calculation if precomputed
>> the gnu hash table only stores the upper 31 bits of the 32 bit wide
>> hashes. the lsb is used to indicate the end of a chain.
>> this missing bit can be guessed for the cost of traversing two chains
>> in the worst case.
>
> you mean two chains per loaded dso?
>
> then i'd expect the optimization to be only enabled
> if dso-index-in-symbol-lookup-order < N.
I don't quite understand what you mean by this.
To clarify - when I am talking about "chains" here i mean the chains
internal to the gnu hashtable and not any linked lists of dsos.
I should probably call these "buckets" and not "chains" to avoid confusion, sorry.
This patch modifies how symbols are looked up in a single dso, not
in the "global scope" of loaded dsos.
So, basically, to do a hashtable lookup, you compute the hash of the symbol name,
and use a modulo operation to select the correct chain/bucket in the hashtable.
Then you only check the symbols in that chain/bucket for equality.
Because the precomputed hashes that this patch is using are each missing one
bit, we can only know that the hashvalue for a given symbol name is one of
two possible values, leading to two possible buckets of the hashtable that
have to be searched.
However, if a match is already found in the first bucket we search, we
of course do not have to search the second bucket.
Searching a hashtable bucket is not that expensive, in my experience they
are rather small.
For example here are the number of buckets for each length in libLLVM.so:
length: 1: 924 buckets
length: 2: 1429 buckets
length: 3: 1918 buckets
length: 4: 1895 buckets
length: 5: 1517 buckets
length: 6: 1045 buckets
length: 7: 615 buckets
length: 8: 271 buckets
length: 9: 127 buckets
length: 10: 46 buckets
length: 11: 26 buckets
length: 12: 9 buckets
length: 13: 2 buckets
> how can ghm<<1 == 0 happen?
Suppose that `gh % (8*sizeof(size_t))` is 63, so ghm will be 0x8000000000000000.
When we shift ghm left by one, we end up with zero.
What we actually want is one, because zgh % (8*sizeof(size_t))` should have been zero.
>> I tested this using clang 22 linking libclang-cpp and libLLVM. I inserted
>> timestamps into the loader startup code to measure startuptime until the
>> programs entry point is called.
>> On my laptop (AMD Ryzen AI 9 365) averaged over 200 measurements:
>> Without optimization: 5441.205 us
>> With optimization: 4714.1 us
>> This suggests a roughly 10% improvement in startuptime in this scenario.
>
> i don't quite understand the scenario (is this just clang startup time?
> why different than the number below?)
Yes, this is just startuptime. It is different from the other numbers,
because timing is measured differently. In these numbers there is some
overhead due to the "rdtsc" instruction i used to get timestamps.
In the other numbers, there is overhead because i call the loader
through `env` to set the library path environment. At least this is my
explanaion. Note that I am running in ldd mode to focus on measuring load times.
> but would be good to know how many dsos (and relocs etc) are involved.
These are the dsos involved:
```
$ ~/src/musl-cross-make/output/x86_64-linux-musl/lib/ld-musl-x86_64.so.1 --list ./bin/clang
/home/fr/src/musl-cross-make/output/x86_64-linux-musl/lib/ld-musl-x86_64.so.1 (0x7f4d20ed9000)
libclang-cpp.so.22.0git => /home/fr/src/llvm-project/build/lib/libclang-cpp.so.22.0git (0x7f4d1d400000)
libLLVM.so.22.0git => /home/fr/src/llvm-project/build/lib/libLLVM.so.22.0git (0x7f4d18e00000)
libstdc++.so.6 => /home/fr/src/musl-cross-make/output/x86_64-linux-musl/lib/libstdc++.so.6 (0x7f4d18a00000)
libgcc_s.so.1 => /home/fr/src/musl-cross-make/output/x86_64-linux-musl/lib/libgcc_s.so.1 (0x7f4d20ea6000)
libc.so => /home/fr/src/musl-cross-make/output/x86_64-linux-musl/lib/ld-musl-x86_64.so.1 (0x7f4d20ed9000)
```
Symbol lookups: used `gh_precomp` path / total calls to find_sym2
10189/14955
Approximations for symbol etc count using readelf:
Undefined symbols:
readelf --dyn-syms libclang-cpp.so.22.0git | grep UND -c
defined symbols:
readelf --dyn-syms libclang-cpp.so.22.0git | grep UND -cv
i exclude relative relocations from the count because they never need a symbol
lookup and are irrelevant.
My libraries have relr compressed relative relocations - i counted using a text
editor and do not have a shell one liner.
libclang has 2204 undefined and 37831 defined symbols and 4723 non-relative relocations.
libLLVM has 340 undefined and 39300 defined symbols and 5479 non-relative relocations.
Best Regards,
Fabian Rast
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.