libc-coord - Proposing dl* extensions with explicit caller specification

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220804224455.GA3232548@juliacomputing.com>
Date: Thu, 4 Aug 2022 18:44:55 -0400
From: Keno Fischer <keno@...iacomputing.com>
To: libc-coord@...ts.openwall.com
Subject: Proposing dl* extensions with explicit caller specification

Dear libc maintainers,

I'm hoping to coordinate consensus on a dlfcn API extension to
address a common paper cut that users encounter when attempting
to use various instrumentation tooling such as the {address,
memory, thread} sanitizers (and others). I don't think the
implementation is particularly difficult, but as it touches
core dlfcn API surface, some consensus would be required among
libc implementations to avoid making a mess.

# The problem

A little known quirk of the dlsym and (on certain implementations)
dl(m)open APIs is that their behavior depends on the calling shared
object. This shared object is usually determined using
__builtin_return_address, or a hand-coded equivalent (e.g. reading
the top of stack of x86_64 or accessing the lr registers on
aarch64).

This implicit dependence on the return address (apart from feeling a
bit like an API smell) breaks the ability to use symbol interposition
on these functions, as the usual interposition/RTLD_NEXT pattern will
result in the call appearing to come from a different shared object
than the non-interposed call. This is a regular cause of end user
complaints (see e.g. [1-7]).

A common suggestion is to use LD_LIBRARY_PATH in order to work around the
missing caller-dependent RUNPATH lookup. However, as I will survey below,
RUNPATH is not the only caller-dependent property (so the workaround
is incomplete) and setting LD_LIBRARY_PATH may affect lookups in other
parts of the application (or any spawned children) in undesirable ways (so
the workaround is potentially harmful to correct operation).

A different suggestion that was previously made (e.g. in [7]) is to switch
the interceptors to a tail call. Where possible, this does address indeed
address the issue (e.g. rr's interceptor [8] does this and doesn't suffer
from the same problem). Unfortunately, this is not always possible. For example,
the memory sanitizer interceptor [9] needs to introspect the loaded object in
order to set up shadow memory for all newly added mappings.

The tail call issue also brings up a related concern: Compiler optimizations
do not model the return-address dependence of these functions and will thus
happily move them into tail call position when possible, raising the possibility
that a compiler upgrade will cause dynamic linker behavior to change.

# A brief survey of current caller-dependence in libcs

How the return address is used is not consistent between different libcs.
Perhaps the most consistent use of the return address is in RTLD_NEXT.
POSIX specifies that:

```
RTLD_NEXT
   Specifies the next executable object file after this one that defines name.
   This one refers to the executable object file containing the invocation of dlsym().
```

Because of the above mentioned tail-call issue, arguably the
implementation using __builtin_return_address is not POSIX compliant,
because the return address may not necessarily be the `object containing
the invocation of dlsym`. Nevertheless, this is a minor issue and not generally
what users run into.

The more common situation of return-address dependence is in `dlopen`. POSIX
makes no mention of return-address dependence in dlopen, so implementations
differ somewhat in their use of the return address in dlopen context.

For implementations that provide the `dlmopen` extension (e.g. Solaris/Illumos,
glibc), the return address is generally used by `dlopen` to identify
the calling objects's namespace.

Implementations without this extension that I surveyed (e.g. musl libc, FreeBSD
libc), generally do not have caller dependence in dlopen (if there is one,
I would love to know about it so I can add it to the list).

For implementations that do look at the calling object inside dlopen, it is
generally used for a few other purposes also, including RUNPATH/RPATH handling,
lookup of certain flags, determination whether the calling object is an audit
object, etc. The RUNPATH/RPATH handling is usually the one that users complain
about, but of course the remaining uses could also introduce hard-to-diagnose
issues. Implementations that do not look at the caller in dlopen, generally
use the main executable for all of these queries.

Illumos also appears to have caller-dependence in `dlclose`, `dlerror` and
`dlinfo`. I assume this is because lookup of this information is per-namespace,
but I did not look into it too closely.

# Proposed API

The proposal here (previously made independently by other people in various
forums) is to add new variants of the caller-dependent dlfcn functions
that take an explicit `dl_caller` pointer that is used in place of the return
address, e.g. for dlsym:

```
#include <dlfcn.h>

void *dlsym_caller(void *restrict handle, const char *restrict symbol, void *restrict dl_caller);
```

Naturally there would be a `dlvsym_caller` for libcs that provide the `dlvsym`
extension (and analogously for e.g. `dlfunc` on FreeBSD).

For `dlopen`, since not all implementations have caller dependence, my proposal
would be to not have `dlopen_from`, but instead only provide `dlmopen_from`
(since caller-dependence, seems to be pretty closely tied to the dlmopen
extension):

```
#include <dlfcn.h>

void *dlmopen_caller(Lmid_t lmid, const char *restrict filename, int flags, void *restrict dl_caller);
```

In order to ensure that the dlopen behavior can be emulated without with this
function, I would propose promoting `LM_ID_CALLER` to an exported flag (glibc
already has an internal version of this):
```
LM_ID_CALLER
Load the shared object in the namespace of the calling object (determined
implicitly by `dlmopen` or explicitly from the `dl_caller` argument to
`dlmopen_caller`).
```

# Next steps

I'm hoping this overview was useful as a discussion of the problem I'm
hoping to address and the current state of implementation. I'm not wedded
to the specifics of the proposal, so suggestions for different names or
semantics would be appreciated. I am particularly interested to know if
there are additional complications in one implementation or another that
I failed to pick up on in my survey above.

Otherwise, assuming that people generally like this proposal, I would hope
to be able to implement this in short order. I think in most implementations,
this is simply a matter of adding the appropriate symbols as the functionality
already exists. I recognize that it will probably take 10 years before this
has propagated enough to be widely available to end users, but on the other
hand, people have been complaining about this for the better part of 10 years,
so if we'd fixed it at the time, we'd already be done - better late than never
;).

Cheers,
Keno


[1] https://bugs.llvm.org/show_bug.cgi?id=27790
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=27504
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=25114
[4] https://sourceware.org/bugzilla/show_bug.cgi?id=28008
[5] https://sourceware.org/bugzilla/show_bug.cgi?id=28927
[6] https://github.com/google/sanitizers/issues/1219
[7] https://bugzilla.redhat.com/show_bug.cgi?id=1449604
[8] https://github.com/rr-debugger/rr/blob/master/src/preload/overrides.c#L136-L143
[9] https://github.com/llvm/llvm-project/blob/8e7acb670b3830a2c72ed2a47b93f88be971eed2/compiler-rt/lib/msan/msan_interceptors.cpp#L1332-L1337
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.