Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5703c3fd-c4aa-4ec1-98b5-69626eec71cc@sholland.org>
Date: Mon, 19 Jan 2026 00:18:17 -0600
From: Samuel Holland <samuel@...lland.org>
To: Alyssa Ross <hi@...ssa.is>, Demi Marie Obenour <demiobenour@...il.com>
Cc: Rich Felker <dalias@...c.org>, Vivian Wang <wangruikang@...as.ac.cn>,
 matthewcroughan <matt@...ughan.sh>, musl@...ts.openwall.com
Subject: Re: [PATCH] ldso: Use rpath of dso of caller in dlopen

On 11/13/25 16:51, Alyssa Ross wrote:
> [Resending as I somehow messed up the Cc line.]
> 
> On Wed, Nov 12, 2025 at 02:34:25PM -0500, Demi Marie Obenour wrote:
>> On 11/12/25 12:14, Rich Felker wrote:
>>> On Fri, Oct 17, 2025 at 06:50:52PM +0800, Vivian Wang wrote:
>>>> Grab the return address using an arch-specific wrapper dlopen calling a
>>>> generic __dlopen (analogous to dlsym and __dlsym), and use it to find
>>>> the dso to use as needed_by for load_library in __dlopen. This way, when
>>>> a dso calls dlopen, the library is searched from *this* dso's rpath.
>>>>
>>>> This feature is used by shared libraries that dlopen on demand other
>>>> shared libraries found in nonstandard paths.
>>>>
>>>> This makes the behavior of DT_RUNPATH match glibc better. Also, since we
>>>> already use this behavior with libraries loaded with DT_NEEDED, adding
>>>> support for dlopen makes it more consistent.
>>>>
>>>> By coincidence, both __dlsym and __dlopen take three arguments, the last
>>>> of which is the return address. Therefore all of the arch-specific
>>>> src/ldso/*/dlopen.s is just the corresponding dlsym.s with "dlsym"
>>>> replaced by "dlopen".
>>>
>>> I'm not convinced that this is a good change. With dlsym, behaving
>>> differently based on the call point is optional nonstandard
>>> functionality triggered by passing RTLD_NEXT, and it already has
>>> problems. In particular, the return address does not properly
>>> determine who the caller is; it will be wrong if there's a tail call
>>> to dlsym. We've considered in the past making a new definition for
>>> RTLD_NEXT that uses the address of an object in the translation unit
>>> that uses RTLD_NEXT, which would fix this but has other subtly
>>> different behavior (like if RTLD_NEXT isn't passed directly to dlsym
>>> but to a wrapper for it in a different library) so it's not clear if
>>> it would be a worthwhile improvement.
>>>
>>> In addition to violating least-surprise and having a nonstandard
>>> behavior always active, changing dlopen as in this patch would have
>>> the same tail-call issue, and would only give the behavior some
>>> callers want if dlopen is directly called. For example if you had
>>> loaded a library with its own rpath and instead of calling dlopen, it
>>> called some other-library-provided abstraction for loading modules
>>> that in turn indirectly called dlopen, its rpath would not get used.
>>> This seems confusing and undesirable.
>>
>> I don't think that this violates least-surprise.  At least systemd
>> assumes glibc behavior, and I would not be surprised if other programs
>> and libraries do as well.
> 
> Technically speaking I don't think it's systemd that assumes Glibc
> behaviour.  systemd just puts .note.dlopen sections in its executables
> and libraries, and it's up to the packaging system to use that metadata
> to ensure the mentioned shared libraries are available if desired.
> In the scenario I think we're all coming from, it's Nixpkgs'
> autoPatchelfHook that has turned those into DT_RUNPATH entries.
> Presumably this was only tested with Glibc, and we're only discovering
> it now because people are getting more adventurous with the combination
> of Nixpkgs, musl, and systemd.
> 
> Given what I've read here, perhaps the easiest way forward would be to
> get systemd to (perhaps optionally) use absolute paths for dlopen() of
> optional dependencies like this.

I also encountered this problem in the same scenario. I also agree that
for something like nixpkgs, using absolute paths is the ideal solution,
because each library path is fixed and known in advance, and using
absolute paths avoids the DT_RUNPATH (quadratic) search entirely.

Another less invasive solution is to have autoPatchelfHook propagate
DT_RUNPATH paths from the DSO with the .note.dlopen entries to each
executable that depends on that DSO. A minimal implementation of this
idea[1] was sufficient to get systemd working.

[1]: https://github.com/NixOS/nixpkgs/pull/475746

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.