musl - Re: Draft proposed locale changes

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF1WSuzYsm81k=oCz9_O9+2_h5nLuQGzuZCEMb3Hg=dzb6tt5A@mail.gmail.com>
Date: Mon, 5 Mar 2018 21:42:49 +0300
From: "Konstantin P." <ria.freelander@...il.com>
To: musl@...ts.openwall.com
Subject: Re: Draft proposed locale changes

Can you publish official po file for musl after proposed changes?

On Mon, Mar 5, 2018 at 9:39 PM, Rich Felker <dalias@...c.org> wrote:

>
> localeconv/LC_NUMERIC/LC_MONETARY
>
> Each loaded locale needs an immutable lconv structure to represent
> this data. It needs to be allocated with the locale (at locale loading
> time) since localeconv() has no provision for failure, but we can wait
> to populate it lazily, and we can put the code to populate it in
> localeconv.c so that static-linked programs that don't use this
> rarely-used interface don't have to pay for it. We could also omit
> even allocating it (56/96 bytes) if localeconv.o is not linked, but
> it's probably not worth the special-casing code to do that.
>
> The localeconv structure should be part of struct __locale_map, not
> struct __locale_struct, since it's a pure function of the data in the
> memory-mapped locale file and not a function of how that data is
> linked to a specific locale category. Putting it in __locale_struct
> would just complicate setlocale and newlocale.
>
> The obvious (but not terribly efficient) form for the data in the
> locale file is to have each lconv field as a mo-level key, as in:
>
>         msgid "int_frac_digits"
>         msgstr "2"
>
> A more compact form could pack them all into one, but then the order
> becomes a hidden locale-file interface boundary/ABI.
>
> For the string fields it's necessary that they each be in-place
> strings in the mo file. grouping and mon_grouping also have the
> special constraint that they need to vary by whether the arch uses
> signed or unsigned plain-char (since CHAR_MAX has special meaning) so
> the mo file needs to store both versions. That's ugly but I don't see
> any good way around it. We can probably punt on this for now just by
> not supporting grouping (i.e. only supporting locale definitions that
> don't do grouping), since it's not implemented anyway.
>
> If we support decimal_point, it should not go through the localeconv
> mechanism since it would always be needed by printf and strtod.
> Instead __get_locale should probe it right away and set a 1-bit flag
> in the __locale_map structure for these functions to consume (1-bit
> based on previous research that [.,] are the only values).
>
>
>
> nl_langinfo/LC_TIME/etc.
>
> Eliminate the currently-present wrong values for ERA* and related
> LC_TIME stuff; that gets rid of all ambiguous translation keys except
> "May". Bikeshed up some alternate key for May.
>
>
>
> strerror/LC_MESSAGES
>
> Not sure yet. One radical idea I kinda like is removing all the
> English-phrase messages from libc core and just having strerror
> produce strings like "ENOENT", "EPERM", etc. in the C locale. This
> seems to be the only option that wouldn't either moderately increase
> libc size or require translation files to match the exact current text
> in the builtin English libc messages. Users who want the current
> messages would then need an "en" locale with contents like:
>
>         msgid "ENOENT"
>         msgstr "No such file or directory"
>
> If we don't want this, the possible solutions look like one of:
>
> 1. Prepending the error code and a null byte (e.g. "ENOENT\0") to all
> the existing error strings, then skipping past it if the translation
> was not found.
>
> 2. Putting a second version of strerror in locale_map.c with the E*
> names in it, so it's only linked if you use locale. I strongly dislike
> this approach because it greatly increases the marginal size cost of
> doing the right thing (calling setlocale) and imposes the cost even if
> you don't use strerror at all (only setlocale).
>
> 3. Accepting that translations need to match (and perpetually be
> updated to match) error strings in musl __strerror.h. I don't like
> this much either.
>
> So I think it should be between options 1 and "zero" above. Option
> zero decreases the size of libc by nearly 1k (removing messages) but
> changes the behavior. Option 1 increases the size of libc by about 1k.
>
>
>
> LC_COLLATE
>
> No specific proposal yet. We need a data structure to map characters
> and sequences of characters to collating elements. Obviously the mo
> file's lookups could be used directly (O(log n), improved avg case if
> we ever add hash table support) but they might be heavier than we
> want. The alternative would be having a gigantic string in the mo file
> that's just "compiled" collation table data, but unless it's
> well-designed that seems like an undesirable permanent interface
> boundary.
>
>

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.