musl - internal header proposal

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180907172312.GO1878@brightrain.aerifal.cx>
Date: Fri, 7 Sep 2018 13:23:12 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: internal header proposal

I'm presently working on moving most or all inline-in-source-file
declarations of internal-use interfaces to header files, so that type
mismatches between points of use and points of declaration can be
caught, and so that I can switch them over to hidden visibility
without having to worry about inconsistent application of visibility,
where the following errors are easy to make:

1. On definition, missing from declaration at site of use: definition
   binds at link time, but caller may generate inefficient code using
   GOT/PLT unnecessarily.

2. On declaration at site of use, missing from definition: depending
   on arch and linker version, linker may produce an error about
   unsatisfiable relocation and refuse to link.

The second is a big problem (regression risk) applying visibility to
internal interfaces, so a good method to preclude it is needed.
Putting the hidden attribute only on the declaration in the headers,
and omitting it everywhere else, should avoid it entirely, and also
avoids the first problem as long as -Wmissing-declarations passes.

Anyway, it turns out we have roughly two distinct types of internal
interfaces:

1. Namespace-safe versions of standard/public interfaces that allow
   parts of one subsystem to be used to implement another in cases
   where the namespace rules would not allow the normal public
   interfaces to be used. This includes things like pthread functions
   used to implement C11 threads or thread-safety in plain-C
   interfaces, __strchrnul, resolv.h functions used in getaddrinfo,
   mman.h functions used in malloc, etc.

2. Interfaces that are private to a particular subsystem. This
   includes things like the timezone functions from __tz.c and related
   files, all the internal stdio and pthread and locale glue, etc.

The reason I've broken them down into these two categories is that the
latter already have appropriate places to declare them: the
corresponding *_impl.h header files (sometimes named differently) for
their subsystems, but the former don't. Putting the former group in
with the latter would just massively balloon the set of source files
that need to include some *_impl.h header, and thereby obscure which
files are really intended/allowed to poke at internals of a subsystem
vs just needing access to namespace-safe public or semi-public
interfaces from that subsystem.

So, we need a new place to declare the first group, and I have two
possible ways to do it:


Option 1: The big fancy header wrapping

Add a new tree of "wrapper headers" for public headers (let's call it
$(srcdir)/src/include), and -I it before the real public ones
($(srcdir)/include). These new headers include their corresponding
public header (../../include/[self].h) then add anything else that's
supposed to be "public within musl". For example sys/mman.h would have
stuff like:

hidden void __vm_wait(void);
hidden void __vm_lock(void);
hidden void __vm_unlock(void);

hidden void *__mmap(void *, size_t, int, int, int, off_t);
hidden int __munmap(void *, size_t);
hidden void *__mremap(void *, size_t, size_t, int, ...);
hidden int __madvise(void *, size_t, int);
hidden int __mprotect(void *, size_t, int);

hidden const unsigned char *__map_file(const char *, size_t *);

Now, every file that needs to use mman.h functions without violating
namespace can just #include <sys/mman.h> and use the above. If we
wanted, at some point we could even #define the unprefixed names to
remap to the prefixed ones, and only #undef them in the files that
define them, so that everything automatically gets the namespace-safe,
low-call-overhead names. This idea is a lot like how
syscall()/__syscall() work now -- the musl source files get programmed
with familiar interfaces, and a small amount of header magic makes
them do the right thing rather than depending on a public namespace
violation.

If this all seems too radical, or like it has potential pitfalls we
need to think about before committing to it, I have a less invasive
proposal too:


Option 2: New namespaced.h header

Introduce a single new header that declares all of the namespace-safe
interfaces across all subsystems, with minimal dependencies on other
headers so that it can be included everywhere it's needed with low
cost. Unfortunately some functions need types exposed, but
<sys/types.h> would probably suffice to get just those without pulling
in lots of other stuff.


I think the second option is actually more invasive to the source
tree, in terms of adding #include lines to files. Option 1 has
slightly more hidden complexity, but leads to simplification of the
source, and the complexity does not significantly detract from
readability of the source, in my opinion.

Thoughts on any of this? So far I've been staging commits moving the
subsystem-private internal declarations to appropriate headers (type 2
above), but doing nothing with the namespace-safe versions of public
interfaces (type 1 above). But I'd like to start on them soon too.
When this is all over, I'll be able to add hidden visibility on all of
these, and most of the efficiency lost in having to drop vis.h (see
commit dc2f368e565c37728b0d620380b849c3a1ddd78f) will be regained.
Dynamic linking performance should also slightly increase.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.