musl - Re: ldso : dladdr support

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <502237A1.1000805@gmail.com>
Date: Wed, 08 Aug 2012 11:55:45 +0200
From: musl <b.brezillon.musl@...il.com>
To: musl@...ts.openwall.com
CC: Rich Felker <dalias@...ifal.cx>
Subject: Re: ldso : dladdr support

Here is the new patch for dladdr + gnu hash support.

Please tell me if you find coding style errors.

On 08/08/2012 01:09, Rich Felker wrote:
> On Tue, Aug 07, 2012 at 04:15:34PM +0200, musl wrote:
>>> actually i'm not sure if the function pointer approach is
>>> the right one here, but that's a design question
>>>
>> Could you tell me why (performance)?
> Some reasons I would avoid gratuitous tables of function pointer
> tables for things like this:
>
> 1. In shared libraries, they have to be relocated at load time, which
>    adds to startup time cost. A single table doesn't make much
>    difference, but the _policy_ of avoiding them in general can make a
>    difference when there are many places they might have been used.
>
> 2. Likewise, since they're non-constant, they have to be in the data
>    segment, which increases the size of the library's data segment.
>    Same "single table vs policy" point as in #1 applies.
>
> 3. Since they're non-constant, they're nice targets for an attacker
>    looking to perform code execution exploits. I don't like
>    artificially restricting our use of the language just because
>    certain constructs could help an attacker exploit an
>    ALREADY-vulnerable program, but I also think it's prudent to avoid
>    gratuitous uses like this that serve no purpose and potentially
>    contribute to exploitability.
>
I reworked the implementation to remove all function pointers.
>
>>>> +	uint32_t *hashtab = dso->hashtab;
>>>> +	uint32_t *buckets = hashtab + 4 + (hashtab[2] * (sizeof(size_t) / sizeof(uint32_t)));
>>> i don't know the algorithm but the sizeof magic looks suspicious to me
>> I took the algorithm described here :
>>
>> https://blogs.oracle.com/ali/entry/gnu_hash_elf_sections
>>
>> The maskwords are native word size (64 bits for elf64 and 32 bits for elf32).
>> I could define these macros:
>> #define MASKWORD_SIZE sizeof (Elf32_Word)
>> #define MASKWORD_SIZE sizeof (Elf64_Word)
> In that case, we use size_t, not the ElfXX_Foo mess which only makes
> sense when writing generic tools for dealing with ELF files from
> arbitrary archs, not when handling native ELF structures.
>
>>>> +static uint32_t gnu_hash (const char *s0)
>>>> +{
>>>> +	const unsigned char *s = (void *)s0;
>>>> +	uint_fast32_t h = 5381;
>>>> +	for (unsigned char c = *s; c != '\0'; c = *++s)
>>>> +		h = h * 33 + c;
>>>> +	return h & 0xffffffff;
>>>> +}
>>>> +
>>> i think c != '\0' is not very idiomatic
>>> and i'd avoid using c99 style declaration in for
>>>
>>> use
>>>
>>> for (; *s; s++)
>>> 	h = 33*h + *s;
>>>
>> Corrected.
> Indeed, that's a lot cleaner.
>
>>>> +	uint32_t shift2 = hashtab[3];
>>>> +	uint32_t h2 = h1 >> shift2;
>>> i'm not sure if input validation makes sense in ldso
>>> but shifts can be tricky (hashtab[3] must be in 0..31 here)
>> By validation do you mean mask the shift value with 0x1F ?
> There are millions of ways an invalid/corrupt ELF file can cause the
> dynamic linker/calling program to invoke UB. I don't see much value in
> checking for them. Such invalid files will only exist if created
> intentionally or if your storage medium is corrupted. There is no
> "safe" interface corresponding to dlopen by which you can try opening
> a DSO and sanity-checking it before accessing any code or data from it
> (e.g. the ctors run before dlopen returns!) so the only way to handle
> evil ELF files is to validate the image on disk _before_ calling
> dlopen, and opening it with an absolute pathname to a location under
> your control.
>
>>>> +	for (h1 &= ~1; 1; sym++) {
>>> the 1; in the middle is unnecessary
>>>
>>> h1 &= ~1; is problematic if signed int representation is
>>> not two's complement
>>> (~1 is an implementation defined negative value which is
> It's not implementation-defined. It's defined as operating on the bit
> representation, and once the signed representation is chosen among the
> 3 allowed, the implementation is forced to perform ~ accordingly.
>
>>> then converted to uint32_t according to well defined rules)
> musl will never deal with non-twos-complement implementations (AFAIK,
> no non-twos-complement conformant C implementation has ever been
> created) so it's not a big deal, but I agree it would be cleaner to
> avoid the issue. Personally I try to avoid ever using bitwise
> operators on signed types.
>
>>> i think we can have more efficient code here if
>>> only gnu and sysv hash algos are supported
>>> (i don't think it's realistic to assume new hash algorithms
>>> even gnu hash is a fairly useless addition to elf)
>> This has to do with the function pointer approach.
>> Do you prefer if/else alternative?
> Yes, I prefer if/else.
>
> Also, we had a conversation on the list and/or IRC a while back (I
> don't remember which) where I described some optimizations that should
> be present if gnu hash is to be supported. Basically, the dynamic
> linker should keep track of whether there's one of the two hashes
> that's supported by ALL loaded libs, and in that case, only ever use
> that hash, to avoid computing two different hashes.
The hash for given algo is only computed once (if needed).
That's the reason for the computed table.
If all libs uses the same hash algo, the other will never be computed.
> Another issue that needs to be fixed before gnu hash support can be
> committed is that you removed the hash comparisons for "dlopen",
> "dlsym", and "__stack_chk_fail". Incuring a strcmp against these 3
> strings for every single symbol lookup seems extremely costly; that's
> why I did the hash comparisons in the first place. Keeping track of
> them is important so we can avoid the overhead of supporting these
> functions when there's no way the application can ever attempt to use
> them, but we also want to minimize the startup overhead of making this
> determination.
I added the precomp table wich includes both gnu and sysv hashes for the dlopen,
dlsym and __stack_chk_fail symbols.
>
> Rich


View attachment "ldso-add-dladdr-and-gnu-hash-support.patch" of type "text/x-patch" (9711 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.