|
|
Message-ID: <lpmicdqrzs2h22orlngnrqzntafnpmse32sxkmyjkoh35h3yqg@qhkgehymcc6v>
Date: Thu, 13 Nov 2025 23:51:11 +0100
From: Alyssa Ross <hi@...ssa.is>
To: Demi Marie Obenour <demiobenour@...il.com>
Cc: Rich Felker <dalias@...c.org>, Vivian Wang <wangruikang@...as.ac.cn>,
matthewcroughan <matt@...ughan.sh>, musl@...ts.openwall.com
Subject: Re: [PATCH] ldso: Use rpath of dso of caller in dlopen
[Resending as I somehow messed up the Cc line.]
On Wed, Nov 12, 2025 at 02:34:25PM -0500, Demi Marie Obenour wrote:
> On 11/12/25 12:14, Rich Felker wrote:
> > On Fri, Oct 17, 2025 at 06:50:52PM +0800, Vivian Wang wrote:
> >> Grab the return address using an arch-specific wrapper dlopen calling a
> >> generic __dlopen (analogous to dlsym and __dlsym), and use it to find
> >> the dso to use as needed_by for load_library in __dlopen. This way, when
> >> a dso calls dlopen, the library is searched from *this* dso's rpath.
> >>
> >> This feature is used by shared libraries that dlopen on demand other
> >> shared libraries found in nonstandard paths.
> >>
> >> This makes the behavior of DT_RUNPATH match glibc better. Also, since we
> >> already use this behavior with libraries loaded with DT_NEEDED, adding
> >> support for dlopen makes it more consistent.
> >>
> >> By coincidence, both __dlsym and __dlopen take three arguments, the last
> >> of which is the return address. Therefore all of the arch-specific
> >> src/ldso/*/dlopen.s is just the corresponding dlsym.s with "dlsym"
> >> replaced by "dlopen".
> >
> > I'm not convinced that this is a good change. With dlsym, behaving
> > differently based on the call point is optional nonstandard
> > functionality triggered by passing RTLD_NEXT, and it already has
> > problems. In particular, the return address does not properly
> > determine who the caller is; it will be wrong if there's a tail call
> > to dlsym. We've considered in the past making a new definition for
> > RTLD_NEXT that uses the address of an object in the translation unit
> > that uses RTLD_NEXT, which would fix this but has other subtly
> > different behavior (like if RTLD_NEXT isn't passed directly to dlsym
> > but to a wrapper for it in a different library) so it's not clear if
> > it would be a worthwhile improvement.
> >
> > In addition to violating least-surprise and having a nonstandard
> > behavior always active, changing dlopen as in this patch would have
> > the same tail-call issue, and would only give the behavior some
> > callers want if dlopen is directly called. For example if you had
> > loaded a library with its own rpath and instead of calling dlopen, it
> > called some other-library-provided abstraction for loading modules
> > that in turn indirectly called dlopen, its rpath would not get used.
> > This seems confusing and undesirable.
>
> I don't think that this violates least-surprise. At least systemd
> assumes glibc behavior, and I would not be surprised if other programs
> and libraries do as well.
Technically speaking I don't think it's systemd that assumes Glibc
behaviour. systemd just puts .note.dlopen sections in its executables
and libraries, and it's up to the packaging system to use that metadata
to ensure the mentioned shared libraries are available if desired.
In the scenario I think we're all coming from, it's Nixpkgs'
autoPatchelfHook that has turned those into DT_RUNPATH entries.
Presumably this was only tested with Glibc, and we're only discovering
it now because people are getting more adventurous with the combination
of Nixpkgs, musl, and systemd.
Given what I've read here, perhaps the easiest way forward would be to
get systemd to (perhaps optionally) use absolute paths for dlopen() of
optional dependencies like this.
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.