Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 24 Jun 2016 13:03:47 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: musl ldd: swt build: Error relocating / symbol not found

On Fri, Jun 24, 2016 at 04:23:55PM +0000, Andrei Pozolotin wrote:
> Szabolcs:
> 
> On 06/23/2016 11:15 PM, Szabolcs Nagy wrote:
> > * Andrei Pozolotin <andrei.pozolotin@...il.com> [2016-06-23 19:42:44 +0000]:
> >> b) while at the same time musl ldd reporting that library dependency
> >> tree is resolved with no error:
> >>
> >> lddtree /usr/lib/libswt-atk-gtk-4530.so
> > that's not musl's ldd, but scanelf from pax-utils
> thank you for pointing out.
> > when debugging such a complicated setup the output
> > of tools that may use subtly different library paths
> > and symbol resolution logic is not very helpful.
> ok, got it.
> > ldd /usr/lib/libswt-gtk-4530.so
> ldd /usr/lib/libswt-gtk-4530.so
>         ldd (0x55e333e6c000)
>         libc.musl-x86_64.so.1 => ldd (0x55e333e6c000)
> > ldd /usr/lib/libswt-atk-gtk-4530.so
> ldd /usr/lib/libswt-atk-gtk-4530.so
>         ldd (0x55edc6edc000)
>         libatk-1.0.so.0 => /usr/lib/libatk-1.0.so.0 (0x7fc763298000)
>         libc.musl-x86_64.so.1 => ldd (0x55edc6edc000)
>         libgobject-2.0.so.0 => /usr/lib/libgobject-2.0.so.0 (0x7fc763058000)
>         libglib-2.0.so.0 => /usr/lib/libglib-2.0.so.0 (0x7fc762d6d000)
>         libintl.so.8 => /usr/lib/libintl.so.8 (0x7fc762b5f000)
>         libffi.so.6 => /usr/lib/libffi.so.6 (0x7fc762957000)
>         libpcre.so.1 => /usr/lib/libpcre.so.1 (0x7fc7626fe000)
> > would be more interesting..
> >
> > but even then we don't know what's going on
> > (if libswt-gtk-4530.so is dlopened with RTLD_LOCAL
> > then its libgobject dependency might not be visible
> > to libswt-atk-gtk-4530)
> OK. here is the story:
> 
> * java native interface: NativeLibrary.load()
> http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/74e5fc94c77b/src/share/classes/java/lang/ClassLoader.java#l1726
> 
> * java JNI implementation:
> Java_java_lang_ClassLoader_00024NativeLibrary_load
> http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/74e5fc94c77b/src/share/native/java/lang/ClassLoader.c#l369
> 
> * libjvm.so entry point: os::dll_load
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/tip/src/share/vm/prims/jvm.cpp#l3959
> 
> * libjvm.so linux implementation
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/4529ee76d3f9/src/share/vm/runtime/os.hpp#l564
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/4529ee76d3f9/src/os/linux/vm/os_linux.cpp#l1773
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/4529ee76d3f9/src/os/linux/vm/os_linux.cpp#l1767
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/4529ee76d3f9/src/os/linux/vm/os_linux.cpp#l1997
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/4529ee76d3f9/src/os/linux/vm/os_linux.cpp#l1988
> 
> * and finally: it says: dlopen RTLD_LAZY:
> http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/4529ee76d3f9/src/os/linux/vm/os_linux.cpp#l1988
> void * result = ::dlopen(filename, RTLD_LAZY);
> 
> http://linux.die.net/man/3/dlopen
> RTLD_LAZY: Perform lazy binding. Only resolve symbols as the code that
> references them is executed.
> If the symbol is never referenced, then it is never resolved.
> (Lazy binding is only performed for function references;
> references to variables are always immediately bound when the library is
> loaded.)
> 
> RTLD_LAZY is good, right? :-)

OK, this is likely the root of the problem: invalid code assuming that
it can load libraries with undefined symbols as long as it doesn't try
to use those code paths.

The man page you linked to is rather poor-quality. When symbol binding
takes place with RTLD_LAZY is actually implementation-defined and can
be anywhere between the time of dlopen and the time of use. The flag
should be treated only as a hint for allowing performance
optimizations, not as something that gives the caller permission to do
erroneous things.

Aside from formal correctness, there are multiple reasons for this.
It's architecture- and linktime-option-dependent whether late binding
is even possible at all, and musl purposefully does not implement lazy
binding because it's a huge surface for bugs (which you can see by
looking at glibc's history of bugs caused by lazy binding).

There's one other well-known piece of software, x.org, abusing
RTLD_LAZY in the same way, and we have discussed possible workarounds
before. It would be possible to accept relocations with undefined
symbol references at dlopen time by storing a list of them, and rather
than lazily processing them at call time, re-process them after each
additional dlopen. This would allow broken programs to work without
introducing the bug surface that actual lazy-binding introduces.
However it's a fairly big task to add, and it would be much nicer just
to get the buggy programs fixed (there are already reasonable
workarounds for x.org).

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.