musl - Re: dlsym(handle) may search in unrelated libraries

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190207185736.GR23599@brightrain.aerifal.cx>
Date: Thu, 7 Feb 2019 13:57:36 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: dlsym(handle) may search in unrelated libraries

On Thu, Feb 07, 2019 at 07:36:38PM +0100, Markus Wichmann wrote:
> On Thu, Feb 07, 2019 at 11:54:47AM -0500, Rich Felker wrote:
> > However, a depth-first list of dependencies is also needed to solve
> > the longstanding ctor-order issue discussed in several past threads
> > (which I can look up if search doesn't turn them up for you). This is
> > not a requirement of POSIX (which doesn't even have ctors), but it's a
> > reasonable expectation we currently get wrong and I think it might be
> > specified in ELF or some related sysv "standards".
> 
> Requirements first, nice-to-haves later, right? Once we have the dlsym()
> conformance issue squared away, we can still deal with this issue.
> 
> You mean, people expect their dependencies to be initialized in their
> own initializers? When it is well-known that there is no order to
> initializers, and e.g. the C++ standard says there is no order to
> constructors of static objects.

Yes. GCC has an extension for ctor priority within static linking
scope, but for dynamic linking scope that doesn't help. I don't like
any of this but glib depends on it to avoid just doing the right thing
with pthread_once/call_once, and refuses to fix it.

> > I'm not sure if it makes sense to build both of these lists at the
> > same time. We currently try to avoid allocating dependency lists for
> > libraries loaded at startup in order not to allocate and setup data
> > that might never be used, defering until if/when dlopen is called on
> > them. I've wanted to do the ctor order walk without allocating a
> > dependency list, but I don't know a good way to do so. Note that the
> > code that runs ctors cannot allocate because, at the time it runs,
> > dlopen has already "committed" the load and can't back it out. It has
> > to have all resources it needs for the walk precommitted.
> 
> Depth first means stack. Means recursion or explicit stack. Explicit is
> going to be hard, considering there is no known limit to dependency
> nesting depth. We could argue that the user better make their stack size
> ulimit large enough for the main thread, but I hazard a guess that
> nobody expects dlopen() to use overly much stack in other threads.
> 
> Alright, what's the algorithm here?
> 
> init(lib):
>     if (!lib.inited):
>         foreach d in lib.deps init(d)
>         start_init(lib)
>         lib.inited = 1
> 
> That about right? Because that means we need a stack as high as the
> nesting depth of dependencies. We can get a pessimistic estimation from
> the number of deps loaded, but that may be way too high (see below).

Yes, but you can also avoid recursion just by looping to the deepest
dependency with !inited, then going back to the root. For a one-time
operation at dlopen-time or program-start time, the quadratic search
for each !inited seems unlikely to be a problem:

> > Since the beginning of a list of breadth-first dependencies is just
> > the direct dependencies, if we recorded the number of direct
> > dependencies for each dso, the breadth-first lists could be used to
> > perform a depth-first walk to run ctors; this can be made
> > non-recursive, at the cost of quadratic time but with trivially-tiny
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > constant factor, by simply restarting at the root of the tree after
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > each node and finding the deepest not-yet-constructed dso. This
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> $ ldd =mpv | wc -l
> 292
> 
> And that, I will remind you, is the "suckless" alternative to mplayer.
> 
> Anyway, that is breadth, not depth. Experimentally, it is hard to find a
> chain of libraries deeper than ten. They all eventually just lead back
> to libc.
> 

> So what you could do, when it comes time to run the initializers, is to
> walk the dyn vectors again. Those are already allocated, and all the
> libs they reference are already loaded, so load_library() would only be
> a search operation.

Note that each load_library is a linear search, so you still have the
quadratic time, now with much higher coefficient. And if you want to
use my go-back-to-root approach to avoid having O(depth) working
space, it's cubic time.

Also, as-written, load_library is not "only a search operation";
perhaps this is a bug in itself. For absolute paths it will always
open the file and compare inode identity to see if it matches an
already-loaded file. This should probably be bypassed by first
checking long-name. Fortunately absolute paths in DT_NEEDED are
strongly discouraged/a-really-bad-idea, so it probably doesn't come up
much in practice.

> By the time the initializers run, that must be the
> case, both at load time and at runtime. Unless loading the library
> failed the first time, in which case at runtime we have already bailed.
> And at load time we... haven't. Are you sure continuing on load failure
> of a dependency during load time is a good idea?

I don't follow. The dlopen operation is not committed until load of
all dependencies completes successfully, and if any fail to load, the
whole operation is backed-out. But ctors don't/can't run until *after*
that, when we've already committed to success.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.