|
|
Message-ID: <20251208014454.GG1827@brightrain.aerifal.cx>
Date: Sun, 7 Dec 2025 20:44:54 -0500
From: Rich Felker <dalias@...c.org>
To: Fabian Rast <fabian.rast@....de>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH v2] ldso: skip gnu hash calculation if precomputed
On Sat, Dec 06, 2025 at 06:31:06PM +0100, Fabian Rast wrote:
> calculating the hashes of symbol names can take a significant amount of
> time while applying relocations. however, symbol lookups for
> symbols that are themselves part of the gnu hashtable in the object
> being relocated, the hashvalue is already present in that hashtable
> and can be efficiently retrieved using the symbol index.
>
> the gnu hash table only stores the upper 31 bits of the 32 bit wide
> hashes. the lsb is used to indicate the end of a chain.
> this missing bit can be inferred from the position of the symbol index
> relative to the two possible chains.
> ---
> ldso/dynlink.c | 25 +++++++++++++++++++------
> 1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index 44d9b181..6dee7a97 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -320,10 +320,23 @@ static Sym *gnu_lookup_filtered(uint32_t h1, uint32_t *hashtab, struct dso *dso,
> #if defined(__GNUC__)
> __attribute__((always_inline))
> #endif
> -static inline struct symdef find_sym2(struct dso *dso, const char *s, int need_def, int use_deps)
> +static inline struct symdef find_sym2(struct dso *dso, const char *s, int need_def, int use_deps, struct dso *dso2, int sym_index)
> {
> - uint32_t h = 0, gh = gnu_hash(s), gho = gh / (8*sizeof(size_t)), *ght;
> - size_t ghm = 1ul << gh % (8*sizeof(size_t));
> + uint32_t h = 0, gh, gho, *ght;
> + size_t ghm;
> +
> + if (dso2 && (ght = dso2->ghashtab) && sym_index >= ght[1] && ght[0] > 1) {
> + uint32_t nbuckets = ght[0], *buckets = ght + 4 + ght[2] * 2;
> + uint32_t h0 = buckets[nbuckets + sym_index - ght[1]] & ~1u, h1 = h0|1;
> + uint32_t i0 = buckets[h0 % nbuckets], i1 = buckets[h1 % nbuckets];
> + gh = i0 < i1 && sym_index < i1
> + || i0 > i1 && sym_index >= i0
> + || !i1 ? h0 : h1;
> + } else gh = gnu_hash(s);
> +
> + gho = gh / (8*sizeof(size_t));
> + ghm = 1ul << gh % (8*sizeof(size_t));
> +
This looks a lot more expensive than the method I suggested using the
bloom filter, which has no modulo, much less two.
> struct symdef def = {0};
> struct dso **deps = use_deps ? dso->deps : 0;
> for (; dso; dso=use_deps ? *deps++ : dso->syms_next) {
> @@ -353,7 +366,7 @@ static inline struct symdef find_sym2(struct dso *dso, const char *s, int need_d
>
> static struct symdef find_sym(struct dso *dso, const char *s, int need_def)
> {
> - return find_sym2(dso, s, need_def, 0);
> + return find_sym2(dso, s, need_def, 0, NULL, STN_UNDEF);
> }
>
> static struct symdef get_lfs64(const char *name)
> @@ -437,7 +450,7 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
> ctx = type==REL_COPY ? head->syms_next : head;
> def = (sym->st_info>>4) == STB_LOCAL
> ? (struct symdef){ .dso = dso, .sym = sym }
> - : find_sym(ctx, name, type==REL_PLT);
> + : find_sym2(ctx, name, type==REL_PLT, 0, dso, sym_index);
> if (!def.sym) def = get_lfs64(name);
> if (!def.sym && (sym->st_shndx != SHN_UNDEF
> || sym->st_info>>4 != STB_WEAK)) {
> @@ -2345,7 +2358,7 @@ static void *do_dlsym(struct dso *p, const char *s, void *ra)
> return 0;
> } else
> use_deps = 1;
> - struct symdef def = find_sym2(p, s, 0, use_deps);
> + struct symdef def = find_sym2(p, s, 0, use_deps, NULL, STN_UNDEF);
> if (!def.sym) {
> error("Symbol not found: %s", s);
> return 0;
> --
> 2.52.0
>
>
> Hello,
>
> thank you for your feedback on the previous patch.
> I have integrated the idea of using the position of the
> symbol index relative to the start of the two possible chains
> to infer the missing bit in a single calculation that is independent
> of the search path length.
> I decided to do this instead of using the bloom filter because it seems
> more reliable, e.g. the bloom filter might accept both versions of the hash.
>
> There some edgecases that i considered:
>
> the gnu hashtable might only have a single bucket, in which case
> this approach cant work. (the bloom filter idea could still work here though)
>
> In this case the hash is just calculated normally.
>
>
> The hashtable might contain empty chains (buckets[h % nbuckets] == 0)
>
> It is impossible for both considered buckets to be empty. If one bucket is empty
> the correct hash must be the one indexing the other bucket.
>
>
> The relative position of the two chains (which one is first) is undefined.
> The chain indexed by h1 can be before the chain indexed by h0.
> Even if the buckets array is sorted, it can be that
> h0 % nbuckets == nbuckets - 1, and h1 % nbuckets == 0.
>
> > uint32_t i0 = buckets[h0 % nbuckets], i1 = buckets[h1 % nbuckets];
> I used two modulo operations because it was slightly faster than
> branching on (h0 % nbuckets)+1 == nbuckets. I can't confidently explain this...
That's almost surely not true on anything but "big" archs. I would
expect on archs without a hardware div, that even a single modulo
costs a lot more than the entire gnu hash calculation.
Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.