musl - Re: static linking and dlopen

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121209001616.GX20323@brightrain.aerifal.cx>
Date: Sat, 8 Dec 2012 19:16:16 -0500
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: static linking and dlopen

On Sun, Dec 09, 2012 at 02:04:43AM +0200, Paul Schutte wrote:
> > On the flip side, the main legitimate uses for dynamic linking and
> > loading are (1) sharing code that's used by a wide range of
> > applications and allowing it to be upgraded system-wide all at once,
> > and (2) facilitating the extension of an application with third-party
> > code. Usage 1 applies mostly to dynamic linking; 2 mostly to dynamic
> > loading (dlopen).
> >
> 
> Point 1 is probably the reason why most libraries end up as dynamic
> libraries.
> 
> I was wondering about distributing all libraries as static libraries and
> then have the package manager link the application statically as the final
> step of the installation. This way the package manager can keep track
> of dependencies and re-link them if a library change.

This is a very reasonable design. There is _some_ risk of breakage if
the static libraries depend on the application being built using the
exact same headers as the library, but most such dependencies would
also correspond to ABI breakage for the shared library, so I think the
risk is low. The main difficulty is getting applications' build
processes to stop before the final linking and give you output that
you can relink when needed.

> Distributions like Gentoo who install from source is actually in a very
> good position to take advantage of static linking.
> 
> But I can see a lot of compiling/linking happening with this approach.
> 
> Another idea would be to just install a stub where the binary would be.
> First time you run this stub, it will link the binary and store it on the
> disk in some sort of cache. Then just do an exec of that binary. Second
> time that you run this stub, it will check in this cache, link it again if
> it is not there or just exec it if found. This way only the stuff that gets
> used will be re-linked. You can force a re-link by clearing the cache. This

This approach is a bit more difficult, because you need to manage
things like who has privileges to update the binaries. Surely you can
do it with suid and/or a daemon, but it's not entirely trivial.

> what made me wonder about programs that use dlopen.

Actually, I know one more solution for the dlopen issue, but it
requires some application-level hackery. You just link all the modules
you'll need into the main binary with a table of strings identifying
them, and make a dummy dlopen/dlsym implementation that gives you
access to stuff already linked into the application. The level of
"evil hackery" is pretty comparable to most of the stuff gnulib
does...

> I also wonder if the gain would be worth the trouble. I have seen a
> reduction of up to 50% RSS usage on programs that has a lot of shared
> libraries. It should improve responsiveness as there will be less paging.

I think the solution that achieves the best balance between reducing
bloat/slowness/paging and not spending huge amounts of effort is to
abandon the requirement of static linking everything, and instead go
with shared libraries for things that are used by a huge portion of
applications. For shared library "stacks" that have a chain of 10+ .so
files each depending on the rest, you could replace them with a single
.so file containing the whole library stack, as long as none of them
pollute the namespace horribly. This would cut most of the cost of
dynamic linking right there. For libs that aren't used by many apps,
or that are written in C++ (which results in huge dynamic-linking
bloat), I'd just use static versions.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.