Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251208014454.GG1827@brightrain.aerifal.cx>
Date: Sun, 7 Dec 2025 20:44:54 -0500
From: Rich Felker <dalias@...c.org>
To: Fabian Rast <fabian.rast@....de>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH v2] ldso: skip gnu hash calculation if precomputed

On Sat, Dec 06, 2025 at 06:31:06PM +0100, Fabian Rast wrote:
> calculating the hashes of symbol names can take a significant amount of
> time while applying relocations. however, symbol lookups for
> symbols that are themselves part of the gnu hashtable in the object
> being relocated, the hashvalue is already present in that hashtable
> and can be efficiently retrieved using the symbol index.
> 
> the gnu hash table only stores the upper 31 bits of the 32 bit wide
> hashes. the lsb is used to indicate the end of a chain.
> this missing bit can be inferred from the position of the symbol index
> relative to the two possible chains.
> ---
>  ldso/dynlink.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> index 44d9b181..6dee7a97 100644
> --- a/ldso/dynlink.c
> +++ b/ldso/dynlink.c
> @@ -320,10 +320,23 @@ static Sym *gnu_lookup_filtered(uint32_t h1, uint32_t *hashtab, struct dso *dso,
>  #if defined(__GNUC__)
>  __attribute__((always_inline))
>  #endif
> -static inline struct symdef find_sym2(struct dso *dso, const char *s, int need_def, int use_deps)
> +static inline struct symdef find_sym2(struct dso *dso, const char *s, int need_def, int use_deps, struct dso *dso2, int sym_index)
>  {
> -	uint32_t h = 0, gh = gnu_hash(s), gho = gh / (8*sizeof(size_t)), *ght;
> -	size_t ghm = 1ul << gh % (8*sizeof(size_t));
> +	uint32_t h = 0, gh, gho, *ght;
> +	size_t ghm;
> +
> +	if (dso2 && (ght = dso2->ghashtab) && sym_index >= ght[1] && ght[0] > 1) {
> +		uint32_t nbuckets = ght[0], *buckets = ght + 4 + ght[2] * 2;
> +		uint32_t h0 = buckets[nbuckets + sym_index - ght[1]] & ~1u, h1 = h0|1;
> +		uint32_t i0 = buckets[h0 % nbuckets], i1 = buckets[h1 % nbuckets];
> +		gh = i0 < i1 && sym_index < i1
> +		  || i0 > i1 && sym_index >= i0
> +		  || !i1 ? h0 : h1;
> +	} else gh = gnu_hash(s);
> +
> +	gho = gh / (8*sizeof(size_t));
> +	ghm = 1ul << gh % (8*sizeof(size_t));
> +

This looks a lot more expensive than the method I suggested using the
bloom filter, which has no modulo, much less two.

>  	struct symdef def = {0};
>  	struct dso **deps = use_deps ? dso->deps : 0;
>  	for (; dso; dso=use_deps ? *deps++ : dso->syms_next) {
> @@ -353,7 +366,7 @@ static inline struct symdef find_sym2(struct dso *dso, const char *s, int need_d
>  
>  static struct symdef find_sym(struct dso *dso, const char *s, int need_def)
>  {
> -	return find_sym2(dso, s, need_def, 0);
> +	return find_sym2(dso, s, need_def, 0, NULL, STN_UNDEF);
>  }
>  
>  static struct symdef get_lfs64(const char *name)
> @@ -437,7 +450,7 @@ static void do_relocs(struct dso *dso, size_t *rel, size_t rel_size, size_t stri
>  			ctx = type==REL_COPY ? head->syms_next : head;
>  			def = (sym->st_info>>4) == STB_LOCAL
>  				? (struct symdef){ .dso = dso, .sym = sym }
> -				: find_sym(ctx, name, type==REL_PLT);
> +				: find_sym2(ctx, name, type==REL_PLT, 0, dso, sym_index);
>  			if (!def.sym) def = get_lfs64(name);
>  			if (!def.sym && (sym->st_shndx != SHN_UNDEF
>  			    || sym->st_info>>4 != STB_WEAK)) {
> @@ -2345,7 +2358,7 @@ static void *do_dlsym(struct dso *p, const char *s, void *ra)
>  		return 0;
>  	} else
>  		use_deps = 1;
> -	struct symdef def = find_sym2(p, s, 0, use_deps);
> +	struct symdef def = find_sym2(p, s, 0, use_deps, NULL, STN_UNDEF);
>  	if (!def.sym) {
>  		error("Symbol not found: %s", s);
>  		return 0;
> -- 
> 2.52.0
> 
> 
> Hello,
> 
> thank you for your feedback on the previous patch.
> I have integrated the idea of using the position of the
> symbol index relative to the start of the two possible chains
> to infer the missing bit in a single calculation that is independent
> of the search path length.
> I decided to do this instead of using the bloom filter because it seems
> more reliable, e.g. the bloom filter might accept both versions of the hash.
> 
> There some edgecases that i considered:
> 
> the gnu hashtable might only have a single bucket, in which case
> this approach cant work. (the bloom filter idea could still work here though)
> 
> In this case the hash is just calculated normally.
> 
> 
> The hashtable might contain empty chains (buckets[h % nbuckets] == 0)
> 
> It is impossible for both considered buckets to be empty. If one bucket is empty
> the correct hash must be the one indexing the other bucket.
> 
> 
> The relative position of the two chains (which one is first) is undefined.
> The chain indexed by h1 can be before the chain indexed by h0.
> Even if the buckets array is sorted, it can be that
> h0 % nbuckets == nbuckets - 1, and h1 % nbuckets == 0.
> 
> > uint32_t i0 = buckets[h0 % nbuckets], i1 = buckets[h1 % nbuckets];
> I used two modulo operations because it was slightly faster than
> branching on (h0 % nbuckets)+1 == nbuckets. I can't confidently explain this...

That's almost surely not true on anything but "big" archs. I would
expect on archs without a hardware div, that even a single modulo
costs a lot more than the entire gnu hash calculation.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.