Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 22 Jul 2019 11:52:59 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Removing glibc from the musl .2 ABI

On Wed, Jul 17, 2019 at 02:16:51PM -0400, Rich Felker wrote:
> On Wed, Jul 17, 2019 at 01:10:19PM -0500, A. Wilcox wrote:
> > >> Just trying to make sure the community has a clear view of what this
> > >> looks like before we jump in.
> > > 
> > > Yes. This isn't a request to jump in, just looking at feasability and
> > > whether there'd be interest from your side. Being that ABI-compat
> > > doesn't actually work very well without gcompat right now, though, I
> > > think it might make sense. I'll continue to look at whether there are
> > > other options, possibly just transitional, that might be good too.
> > 
> > I meant: I want a clear view of the boundaries between musl and gcompat,
> > before we (Adélie / the gcompat team) jump in and start designing how we
> > want to handle all the new symbols we may end up with :)
> 
> If we go this route, I would think that gcompat could provide all
> symbols which are not either public APIs (extensions you can
> legitimately use in source) or musl-header-induced ABIs (for example
> things like __ctype_get_mb_cur_max, which is used to define the
> MB_CUR_MAX macro). This would include LFS64 as well as the "__xstat"
> stuff, the other __ctype_* stuff, etc.

I think I'd like to go foward with this. Further work on time64 has
made it apparent to me that the current glibc ABI-compat we have
inside musl is fragile and is imposing unwanted constraints on musl,
which has long been one of the criteria for exclusion. In particular,
consider this situation:

Several structures that are part of public interfaces in musl were
created with extra space reserved for future extension. In some cases
the reserved space was added by musl; in other cases glibc had the
same. However, if we mandate glibc ABI-compat, *all* of this reserved
space is permanently unusable:

- If the reserved space is specific to musl, then reads from it may
  fault, and stores to it may clobber unrelated memory, if the
  structure was allocated by glibc-linked code.

- If the reserved space is present in both musl and glibc, we can't
  make use of it without risking that glibc makes some different use
  of it in the future, making calls from glibc-linked code dangerous.

This came up in the context of structs rusage and timex, but also
applies to stat, sched_param, sysinfo, statvfs, and perhaps others,
which might have reason for wanting extensibility in the future.

Right now, without the glibc ABI-compat constraint, getrusage, wait3,
and wait4 can avoid new time64 remappings entirely (by using the
reserved space we already have in rusage, which glibc doesn't have at
all). [clock_]adjtime[x] hit the second case -- glibc also has
reserved space in timex, but if they end up wanting to use it for
something else and we've put the 64-bit time there, we may be in
trouble.

I don't think the rusage and timex issues here are compelling by
themselves. It's not a big deal to make compat shims here, and I might
still end up doing it. But I think it's indicative that maintaining
glibc ABI-compat in musl is going to become increasingly problematic.

So, what I'd (tentatively; for discussion) like to do:

When ldso loads an application or shared library and detects that it's
glibc-linked (DT_NEEDED for libc.so.6), it both loads a gcompat
library instead *and* flags the dso as needing ABI-compat. The gcompat
library would be permanently RTLD_LOCAL, unable to be used for
resolving global symbols, since it would have to define symbols
conflicting with libc symbols names and with future directions of the
musl ABI.

Symbol lookups when relocating such a flagged dso would take place by
first processing gcompat (logically, adding it to the head of the dso
search list), then the normal symbol search order. The gcompat library
could also provide a replacement dlsym function, so that dlsym calls
from the glibc-linked DSO also follow this order, and a replacement
dlopen, so that dlopen of libc from the glibc-linked DSO would get the
gcompat module.

I'm not sure what mechanism gcompat would then use to make its own
references to the underlying real libc functions. This is something
we'd need to think about.

Before we decide to do it, please be aware that this would be a bit of
a burden on gcompat to do more than it's doing now. But it would also
make lots of cases work that fundamentally *can't* work now -- compat
with 32-bit code using the legacy 32-bit off_t functions, compat with
64-bit code using regexec, etc. -- anywhere the musl ABI currently
conflicts with the glibc ABI. Of course much of this is optional. The
new things that would be mandatory would mainly be moving over
existing glibc compat shims (like the __ctype and __xstat stuff) and
implementing converting wrappers where musl's use of reserved space
creates unsafety/incompatibility with the existing glibc code.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.