Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 23 Apr 2020 12:24:06 -0400
From: Rich Felker <>
To: Paul Sokolovsky <>,
Subject: Re: foreign-dlopen: dlopen() from static binary, again (and
 not the way you think!)

On Thu, Apr 23, 2020 at 02:22:34PM +0200, Szabolcs Nagy wrote:
> * Paul Sokolovsky <> [2020-04-23 12:16:26 +0300]:
> > Hello,
> > 
> > On Wed, 22 Apr 2020 22:39:41 -0400
> > Rich Felker <> wrote:
> > 
> > []
> > 
> > > > Oh, forgot to say that I'm not looking for a way to load a
> > > > particular musl-dynlinked shared library into musl-staticlinked
> > > > binary. So, arguments like "but you'll need to carry around musl's
> > > >" don't apply. What I'm looking for is a way to have a
> > > > static closed-world application, but let it, at the user's request,
> > > > to interface with whatever system may be outside.
> > []
> > > > of concept code is at
> > .  
> > > 
> > > In your example it looks like you're foreign_dlopen'ing glibc. That
> > > simply *can't* work, because part of the interface contract of all
> > > glibc functions is that they're called with the thread pointer
> > > register (%gs or %fs on i386 or x86_64 respectively) pointing to a
> > > glibc TCB, which will not be the case when they're invoked from a
> > > musl-linked (or other non-glibc-linked) program.
> > 
> > Thanks for the response and for the word of warning. As I mentioned,
> > this is essentially a proof of concept, and so far was tested only by
> > calling glibc's printf() from a host app which was either linked with
> > glibc itself or -nostdlib and static. And that was already more than
> > with any other ELF loader which I tried (which worked for simple
> > functions like write(), but crashed in anything more complex like
> > printf()).
> > 
> > But it certainly doesn't touch a case you describe, when "foreign" vs
> > local libc expect different values of %gs/%fs (so apparently, "foreign
> > function call" facility would need to swap them around a call).
> yes, libc functions should be called on libc owned
> threads and your code can only run on the same thread if
> you follow the same abi (which is more than just the
> call convention), swapping the thread pointer means that
> the foreign libc has to create the thread on which you
> invoke the foreign function (or it has to be the main
> thread) since the data structures at tp are set up at
> thread creation (or early libc init for the main thread).
> what's worse is that some process global state also
> has to be under the control of libc (e.g. libc internal
> signal handlers or global state controlled via prctl or
> libc may want fd 0,1,2 in a particular state) so cross
> calling a different libc involves system calls (e.g. the
> go runtime gets this wrong for obvious reasons: calling
> c from go would be really slow, this is why you normally
> try to avoid using your own libc independent runtime.
> go gets away with this because libc internal signals are
> rarely relevant and most process state is per thread on
> linux so if you let the foreign libc to create the os
> threads and take over the signal handlers and signal
> masks then things work)

Yes, I don't think the "swap the thread pointer" approach works. And
even if not for the other global state you pointed out, swapping the
thread pointer is not safe if any signal handler may run, including
even implementation-internal signals which you can't block. Moreover
libc could even implement its own signal layer where underlying
kernel signals aren't blocked just because they're blocked from the
application's perspective. Any attempt to run a foreign libc in the
same process is inherently going to be poking at implementation
internals that are not stable interfaces you can make use of.

> > > If you relax to the case where you're not doing that, and instead only
> > > opening *pure library* code which has no tie-in to global state or TLS
> > > contracts, then it should be able to work.
> it's not documented what api is implemented as pure
> library code and in principle libc code may call
> other libc code via plt and then lazy binding can
> happen which is not pure. (glibc tries to avoid this
> of course, but it does have some runtime loaded
> components e.g. for locale specific char conversions
> so things that may seem pure from the outside can end
> up unpure).

I'm referring to pure library code that doesn't even link libc, much
less that's part of libc. A .so file with no DT_NEEDED at all (linked
with -nostdlib).


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.