musl - Re: [PATCH v2] ldso: skip gnu hash calculation if precomputed

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251208014454.GG1827@brightrain.aerifal.cx>
Date: Sun, 7 Dec 2025 20:44:54 -0500
From: Rich Felker <dalias@...c.org>
To: Fabian Rast <fabian.rast@....de>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH v2] ldso: skip gnu hash calculation if precomputed

On Sat, Dec 06, 2025 at 06:31:06PM +0100, Fabian Rast wrote:
> calculating the hashes of symbol names can take a significant amount of
> time while applying relocations. however, symbol lookups for
> symbols that are themselves part of the gnu hashtable in the object
> being relocated, the hashvalue is already present in that hashtable
> and can be efficiently retrieved using the symbol index.
> 
> the gnu hash table only stores the upper 31 bits of the 32 bit wide
> hashes. the lsb is used to indicate the end of a chain.
> this missing bit can be inferred from the position of the symbol index
> relative to the two possible chains.
> ---
>  ldso/dynlink.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index 44d9b181..6dee7a97 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -320,10 +320,23 @@ static Sym *gnu_lookup_filtered(uint32_t h1, uint32_t *hashtab, struct dso *dso,
>  #if defined(__GNUC__)
>  __attribute__((always_inline))
>  #endif
> -static inline struct symdef find_sym2(struct dso *dso, const char *s, int need_def, int use_deps)
> +static inline struct symdef find_sym2(struct dso *dso, const char *s, int need_def, int use_deps, struct dso *dso2, int sym_index)
>  {
> -	uint32_t h = 0, gh = gnu_hash(s), gho = gh / (8*sizeof(size_t)), *ght;
> -	size_t ghm = 1ul << gh % (8*sizeof(size_t));
> +	uint32_t h = 0, gh, gho, *ght;
> +	size_t ghm;
> +
> +	if (dso2 && (ght = dso2->ghashtab) && sym_index >= ght[1] && ght[0] > 1) {
> +		uint32_t nbuckets = ght[0], *buckets = ght + 4 + ght[2] * 2;
> +		uint32_t h0 = buckets[nbuckets + sym_index - ght[1]] & ~1u, h1 = h0|1;
> +		uint32_t i0 = buckets[h0 % nbuckets], i1 = buckets[h1 % nbuckets];
> +		gh = i0 < i1 && sym_index < i1
> +		  || i0 > i1 && sym_index >= i0
> +		  || !i1 ? h0 : h1;
> +	} else gh = gnu_hash(s);
> +
> +	gho = gh / (8*sizeof(size_t));
> +	ghm = 1ul << gh % (8*sizeof(size_t));
> +

This looks a lot more expensive than the method I suggested using the
bloom filter, which has no modulo, much less two.

>  	struct symdef def = {0};
>  	struct dso **deps = use_deps ? dso->deps : 0;
>  	for (; dso; dso=use_deps ? *deps++ : dso->syms_next) {
> @@ -353,7 +366,7 @@ static inline struct symdef find_sym2(struct dso *dso, const char *s, int need_d
>  
>  static struct symdef find_sym(struct dso *dso, const char *s, int need_def)
>  {
> -	return find_sym2(dso, s, need_def, 0);
> +	return find_sym2(dso, s, need_def, 0, NULL, STN_UNDEF);
>  }
>  
>  static struct symdef get_lfs64(const char *name)
> @@ -437,7 +450,7 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
>  			ctx = type==REL_COPY ? head->syms_next : head;
>  			def = (sym->st_info>>4) == STB_LOCAL
>  				? (struct symdef){ .dso = dso, .sym = sym }
> -				: find_sym(ctx, name, type==REL_PLT);
> +				: find_sym2(ctx, name, type==REL_PLT, 0, dso, sym_index);
>  			if (!def.sym) def = get_lfs64(name);
>  			if (!def.sym && (sym->st_shndx != SHN_UNDEF
>  			    || sym->st_info>>4 != STB_WEAK)) {
> @@ -2345,7 +2358,7 @@ static void *do_dlsym(struct dso *p, const char *s, void *ra)
>  		return 0;
>  	} else
>  		use_deps = 1;
> -	struct symdef def = find_sym2(p, s, 0, use_deps);
> +	struct symdef def = find_sym2(p, s, 0, use_deps, NULL, STN_UNDEF);
>  	if (!def.sym) {
>  		error("Symbol not found: %s", s);
>  		return 0;
> -- 
> 2.52.0
> 
> 
> Hello,
> 
> thank you for your feedback on the previous patch.
> I have integrated the idea of using the position of the
> symbol index relative to the start of the two possible chains
> to infer the missing bit in a single calculation that is independent
> of the search path length.
> I decided to do this instead of using the bloom filter because it seems
> more reliable, e.g. the bloom filter might accept both versions of the hash.
> 
> There some edgecases that i considered:
> 
> the gnu hashtable might only have a single bucket, in which case
> this approach cant work. (the bloom filter idea could still work here though)
> 
> In this case the hash is just calculated normally.
> 
> 
> The hashtable might contain empty chains (buckets[h % nbuckets] == 0)
> 
> It is impossible for both considered buckets to be empty. If one bucket is empty
> the correct hash must be the one indexing the other bucket.
> 
> 
> The relative position of the two chains (which one is first) is undefined.
> The chain indexed by h1 can be before the chain indexed by h0.
> Even if the buckets array is sorted, it can be that
> h0 % nbuckets == nbuckets - 1, and h1 % nbuckets == 0.
> 
> > uint32_t i0 = buckets[h0 % nbuckets], i1 = buckets[h1 % nbuckets];
> I used two modulo operations because it was slightly faster than
> branching on (h0 % nbuckets)+1 == nbuckets. I can't confidently explain this...

That's almost surely not true on anything but "big" archs. I would
expect on archs without a hardware div, that even a single modulo
costs a lot more than the entire gnu hash calculation.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.