musl - Re: Locale support considered harmful noise

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200219033604.GZ1663@brightrain.aerifal.cx>
Date: Tue, 18 Feb 2020 22:36:04 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Locale support considered harmful noise

On Tue, Feb 18, 2020 at 07:38:29PM +0000, Jacob Welsh wrote:
> Hello,
> 
> In TMSR we've made extensive use of musl, due to the very welcome
> dose of clear and concise code it provides as compared to the
> competition [1]. For example we have a static Ada compiler [2], the
> Bitcoin reference implementation [3], a reproducible and
> self-contained Gentoo system [4], and not least of all my own
> distribution [5] used in my consulting business [6].
> 
> However, the apparent goal of aggressive expansion of Unicode and
> localization "features" in musl sets off alarms; for instance, on
> the roadmap [7] I see:

I think you're rather under-informed on this topic. Basically none of
the following add any complexity:

> >Unicode 12.1 update and related character handling work

This was (1) an update of existing tables and (2) throwing out
hand-written case mapping code that made lots of fragile assumptions
and had to be updated by hand with every addition of new case
mappings, and that got slower with each addition, and replacing it
with a table-based approach I'd designed a year or so ago that's more
like the rest of the character tables and admits automatic generation.

> >Locale support overhaul.

This is not adding anything new but fixing bugs where the code that's
already there doesn't work as intended.

> >Hostname resolver support for non-ASCII domains (IDN)
> 
> >LC_COLLATE support for collation orders other than simple codepoint order

These have been serious missing functionality since the beginning.
There is no change here. If you missed them being on the roadmap for
the past 6+ years, you weren't looking very closely.

> >Support for LC_MONETARY and LC_NUMERIC properties.

This is the only item that's controversial, but you don't seem to be
coming from a good position to have input on it.

> >Message translation support for dynamic linker

This has also been on the agenda for a long time. It's the only place
in musl where format strings containing natural-language text are
used, and format strings are not candidates for translation because
it's unsafe (data can replace format specifiers with incompatible
ones), making it inconsistent with the rest of musl which does have
message translation support.

> >Locale data and libc message translations

This is purely a matter of creating data to be used with functionality
that already exists.

> We think this is such a bad idea that it threatens to undermine
> musl's otherwise substantial virtues. This kind of bloat imposes
> real costs on the users that matter - namely the literate ones, who
> value predictable, stable and bug-free code - in exchange for
> entirely unclear benefits.

If you think the above imply bloat, musl must already be bloated.

You should probably be aware that first-class support for all
characters in Unicode (vs glibc's bloated gconv-plugin layer for UTF-8
which originally made GNU grep over 100x slower than in 8-bit codepage
locales) was _THE_ original motivation for what became musl. None of
this is new. Not treating users like they're "illiterate" if they want
to be able to write their own name has always been the most important
core value of the project, and your attitude towards the matter here
does not make me interested in going out of my way to cater to you. I
suspect others in this community feel similarly.

> Especially considering the rate at which bugs are still turning up,
> there is no justification for this added complexity. In any event we
> will not be using "upgrades" that import additional nonsense into
> this critical system component.

If you want to stick with old versions and maintain them yourself or
pay someone else to do so, that's your choice.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.