Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251113004931.GL1827@brightrain.aerifal.cx>
Date: Wed, 12 Nov 2025 19:49:32 -0500
From: Rich Felker <dalias@...c.org>
To: Demi Marie Obenour <demiobenour@...il.com>
Cc: musl@...ts.openwall.com, Vivian Wang <wangruikang@...as.ac.cn>,
	matthewcroughan <matt@...ughan.sh>
Subject: Re: [PATCH] ldso: Use rpath of dso of caller in dlopen

On Wed, Nov 12, 2025 at 02:34:25PM -0500, Demi Marie Obenour wrote:
> On 11/12/25 12:14, Rich Felker wrote:
> > On Fri, Oct 17, 2025 at 06:50:52PM +0800, Vivian Wang wrote:
> >> Grab the return address using an arch-specific wrapper dlopen calling a
> >> generic __dlopen (analogous to dlsym and __dlsym), and use it to find
> >> the dso to use as needed_by for load_library in __dlopen. This way, when
> >> a dso calls dlopen, the library is searched from *this* dso's rpath.
> >>
> >> This feature is used by shared libraries that dlopen on demand other
> >> shared libraries found in nonstandard paths.
> >>
> >> This makes the behavior of DT_RUNPATH match glibc better. Also, since we
> >> already use this behavior with libraries loaded with DT_NEEDED, adding
> >> support for dlopen makes it more consistent.
> >>
> >> By coincidence, both __dlsym and __dlopen take three arguments, the last
> >> of which is the return address. Therefore all of the arch-specific
> >> src/ldso/*/dlopen.s is just the corresponding dlsym.s with "dlsym"
> >> replaced by "dlopen".
> > 
> > I'm not convinced that this is a good change. With dlsym, behaving
> > differently based on the call point is optional nonstandard
> > functionality triggered by passing RTLD_NEXT, and it already has
> > problems. In particular, the return address does not properly
> > determine who the caller is; it will be wrong if there's a tail call
> > to dlsym. We've considered in the past making a new definition for
> > RTLD_NEXT that uses the address of an object in the translation unit
> > that uses RTLD_NEXT, which would fix this but has other subtly
> > different behavior (like if RTLD_NEXT isn't passed directly to dlsym
> > but to a wrapper for it in a different library) so it's not clear if
> > it would be a worthwhile improvement.
> > 
> > In addition to violating least-surprise and having a nonstandard
> > behavior always active, changing dlopen as in this patch would have
> > the same tail-call issue, and would only give the behavior some
> > callers want if dlopen is directly called. For example if you had
> > loaded a library with its own rpath and instead of calling dlopen, it
> > called some other-library-provided abstraction for loading modules
> > that in turn indirectly called dlopen, its rpath would not get used.
> > This seems confusing and undesirable.
> 
> I don't think that this violates least-surprise.  At least systemd
> assumes glibc behavior, and I would not be surprised if other programs
> and libraries do as well.

I think we're going by different definitions of least-surprise. What I
mean is that it is extremely abnormal, and impossible within what you
can implement in the standard language, for a function to behave
differently depending on where it's called from.

I would not use "systemd expects it" as evidence that a behavior is
"least surprise".

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.