musl - Re: ldso : dladdr support

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120811230536.GQ27715@brightrain.aerifal.cx>
Date: Sat, 11 Aug 2012 19:05:36 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: ldso : dladdr support

On Wed, Aug 08, 2012 at 03:57:15PM +0200, musl wrote:
> Same as before except I use bit mask to support multiple hash algorithms.

Sorry for taking a while to get back to you. I haven't had as much
time to work on musl the past couple weeks and some other topics (like
mips dynamic linking) had priority, but I hope to have more again for
a while now. Here's a quick review of some things that will hopefully
turn into a discussion for improving/simplifying the code.

> +#ifdef _GNU_SOURCE
> +typedef struct {
> +    const char *dli_fname;  /* Pathname of shared object that
> +                               contains address */
> +    void       *dli_fbase;  /* Address at which shared object
> +                               is loaded */
> +    const char *dli_sname;  /* Name of nearest symbol with address
> +                               lower than addr */
> +    void       *dli_saddr;  /* Exact address of symbol named
> +                               in dli_sname */
> +} Dl_info;

musl policy is not to have commentary, especially copied from
third-party sources, in the public headers. This is partly to
strengthen the claim that all public headers are public domain
(contain no copyrightable content) and partly just to avoid size creep
and wasted parsing time by the compiler.

>  static void decode_dyn(struct dso *p)
>  {
> -	size_t dyn[DYN_CNT] = {0};
> -	decode_vec(p->dynv, dyn, DYN_CNT);
> -	p->syms = (void *)(p->base + dyn[DT_SYMTAB]);
> -	p->hashtab = (void *)(p->base + dyn[DT_HASH]);
> -	p->strings = (void *)(p->base + dyn[DT_STRTAB]);
> +	size_t *v;
> +	p->hashalgs = 0;
> +	for (v = p->dynv; v[0]; v+=2) {
> +		switch (v[0]) {
> +		case DT_SYMTAB:
> +			p->syms = (void *)(p->base + v[1]);
> +			break;
> +		case DT_HASH:
> +			p->hashtabs[SYSV_HASH] = (void *)(p->base + v[1]);
> +			p->hashalgs |= (1 << SYSV_HASH);
> +			break;
> +		case DT_STRTAB:
> +			p->strings = (void *)(p->base + v[1]);
> +			break;
> +		case DT_GNU_HASH:
> +			p->hashtabs[GNU_HASH] = (void *)(p->base + v[1]);
> +			p->hashalgs |= (1 << GNU_HASH);
> +			break;
> +		default:
> +			break;
> +		}
> +	}
>  }

This is rather ugly but I don't see a better way right off, since
DT_GNU_HASH has that huge value... Maybe it would be nice to improve
decode_vec to have a variant that takes a (static const) table of DT_x
values and struct offsets to store them at when found..? This is just
some rambling and I'm not sure it's better, but it might be smart if
we're going to want to continue adding support for more
non-original-sysv DT_* entries with huge values, so we don't have to
write new switches for each one.

BTW, while I _want_ it to be safe, it's possible that early switches
(early meaning prior to the comment in __dynlink that says libc is now
fully functional) will actually fail to work/crash on some archs... So
this needs consideration too.

>  static struct dso *load_library(const char *name)
> @@ -784,8 +899,11 @@ end:
>  static void *do_dlsym(struct dso *p, const char *s, void *ra)
>  {
> [...]
>  	if (sym && sym->st_value && (1<<(sym->st_info&0xf) & OK_TYPES))
>  		return p->base + sym->st_value;
>  	if (p->deps) for (i=0; p->deps[i]; i++) {
> -		sym = lookup(s, h, p->deps[i]);
> +		algs = p->deps[i]->hashalgs;
> +		if (!(algs & ok)) {
> +			if (algs & SYSV_HASH) {
> +				h[SYSV_HASH] = sysv_hash(s);
> +				sym = sysv_lookup(s, h[SYSV_HASH], p->deps[i]);
> +				ok |= SYSV_HASH;
> +			} else {
> +				h[GNU_HASH] = gnu_hash(s);
> +				sym = gnu_lookup(s, h[GNU_HASH], p->deps[i]);
> +				ok |= GNU_HASH;
> +			}
> +		} else {
> +			if (algs & SYSV_HASH)
> +				sym = sysv_lookup(s, h[SYSV_HASH], p->deps[i]);
> +			else
> +				sym = gnu_lookup(s, h[GNU_HASH], p->deps[i]);
> +		}

This looks like a lot of code duplication and extra unnecessary
variables. The way I would do it is something like:

if (p->deps[i]->hashtab && (h || !p->deps[i]->ghashtab)) {
	if (!h) h = hash(s);
	sym = sysv_lookup(s, h, p->deps[i]);
}

i.e. if there's a sysv hash table and we've already computed h (sysv
hash) or if there's no gnu hash table, compute h if it wasn't already
computed, and then attempt a lookup with it.

I'm not sure I got the logic all right (this is really a 1-minute
glance over the code right now amidst doing lots of other stuff too)
but the ideas are:

- no need for extra vars for bitmask. Whether the hash var for the
  corresponding hash type is nonzero is sufficient to tell whether
  it's been computed.
- no need for extra vars/fields to store which hash types a dso has.
  Just use the hashtab/ghashtab fields in the dso struct, and let them
  be null if the corresponding hash table does not exist. (And don't
  make them an array unless there's a real benefit in using an array;
  I don't think there is any benefit unless you're aiming for
  extensibility to support N hash types.)

> +static int do_dladdr (void *addr, Dl_info *info)
> [...]
> +			if (p->hashalgs & (1 << SYSV_HASH)) {
> +				hashtab = p->hashtabs[SYSV_HASH];
> +				for (i = 0; i < hashtab[1]; i++) {

I'm not seeing why this function needs hash tables at all. It's not
looking up symbols, just iterating over the entire symbol table, no?
Please explain if I'm mistaken.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.