Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 9 Jun 2015 00:27:30 -0400
From: Rich Felker <>
Subject: Re: Build option to disable locale [was: Byte-based C locale,
 draft 1]

On Mon, Jun 08, 2015 at 08:20:26PM -0700, Isaac Dunham wrote:
> On Mon, Jun 08, 2015 at 04:46:42AM +0200, Harald Becker wrote:
> > On 08.06.2015 02:33, Rich Felker wrote:
> > >So aside from iconv, the above seem to total around 19k, and at least
> > >6k of that is mandatory if you want to be able to claim to support
> > >UTF-8. So the topic at hand seems to be whether you can save <13k of
> > > size by hacking out character handling/locale related features
> > >that are non-essential to basic UTF-8 support...
> > 
> > I like to get a stripped down version, which eliminate all the unnecessary
> > char set handling code used in dedicated systems, but stripping that on
> > every release is too much work to do.
> > 
> > The benefit may be for:
> > 
> > - embedded systems
> > - small initramfs based systems
> > - container systems
> > - minimal chroot environments
> Somehow it sounds like you may not have gotten wat Rich was asking.
> IIRC, the goals of musl include full native support for UTF-8; keeping 
> the time complexity to a minimum; and clean, correct code.
> Dropping out 'legacy' charsets doesn't really sacrifice those goals.
> But the other changes are have a much bigger impact on them.
> So you're probably going to have to convince Rich that there *is* a
> major benefit ('is' != 'could be').
> For container systems or minimal chroot environments, you're dealing
> with something that doesn't have a hard size limit, and if a chroot
> or container runs ~6 MB ordinarily, you might be able to run 0.3% more
> on the same hardware. That's probably not enough of a case.
> For initramfs-based systems, you've got a similar situation but no
> chance to multiply the effect, unless you're using a VM or hypervisor.
> Now, since embedded systems have hard limits on size, you might be
> able to make a case there. But you will need to come up with somthing
> more specific, such as "I have a system where I could upgrade the kernel
> to 2.6.xx *if* musl were ~20k smaller than building with a minimal
> iconv" or "If we did this, there would be enough space to switch XYZ
> router firmware from telnetd to dropbear".

Yes, this is roughly what I was saying. Thank you for expressing it
better than I could.

And along those lines, if you really need to minimize for such
a special case, the solution is not manually maintaining extra knobs
and #ifdefs, but changing the way is generated. Instead of
linking all the object files directly, put them in a .a file first,
then link with something like:

$CC -shared -o -Wl,-u,sym1 -Wl,-u,sym2 ... libc_so.a

where the list sym1, sym2, ... is generated from 'nm' output for all
the binaries you need to run, plus a few mandatory libc-internal
symbols that need to be linked. This will produce the minimal
needed for your exact set of programs.

In the specific case of UTF-8 and locale-related code, I believe that
if none of your programs call setlocale or use any of the wchar
functions, regex/fnmatch/glob, or iconv explicitly, the only code that
we discussed that would get linked into is mbtowc.c and
wcrtomb.c, for a total of about 550 bytes. Even these would be omitted
if you don't use printf or scanf (printf needs wcrtomb; scanf needs
mbtowc). Using fnmatch/glob/regex would pull in another ~9k for the
character class and case mapping functions.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.