Date: Mon, 5 Mar 2018 13:39:50 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Draft proposed locale changes localeconv/LC_NUMERIC/LC_MONETARY Each loaded locale needs an immutable lconv structure to represent this data. It needs to be allocated with the locale (at locale loading time) since localeconv() has no provision for failure, but we can wait to populate it lazily, and we can put the code to populate it in localeconv.c so that static-linked programs that don't use this rarely-used interface don't have to pay for it. We could also omit even allocating it (56/96 bytes) if localeconv.o is not linked, but it's probably not worth the special-casing code to do that. The localeconv structure should be part of struct __locale_map, not struct __locale_struct, since it's a pure function of the data in the memory-mapped locale file and not a function of how that data is linked to a specific locale category. Putting it in __locale_struct would just complicate setlocale and newlocale. The obvious (but not terribly efficient) form for the data in the locale file is to have each lconv field as a mo-level key, as in: msgid "int_frac_digits" msgstr "2" A more compact form could pack them all into one, but then the order becomes a hidden locale-file interface boundary/ABI. For the string fields it's necessary that they each be in-place strings in the mo file. grouping and mon_grouping also have the special constraint that they need to vary by whether the arch uses signed or unsigned plain-char (since CHAR_MAX has special meaning) so the mo file needs to store both versions. That's ugly but I don't see any good way around it. We can probably punt on this for now just by not supporting grouping (i.e. only supporting locale definitions that don't do grouping), since it's not implemented anyway. If we support decimal_point, it should not go through the localeconv mechanism since it would always be needed by printf and strtod. Instead __get_locale should probe it right away and set a 1-bit flag in the __locale_map structure for these functions to consume (1-bit based on previous research that [.,] are the only values). nl_langinfo/LC_TIME/etc. Eliminate the currently-present wrong values for ERA* and related LC_TIME stuff; that gets rid of all ambiguous translation keys except "May". Bikeshed up some alternate key for May. strerror/LC_MESSAGES Not sure yet. One radical idea I kinda like is removing all the English-phrase messages from libc core and just having strerror produce strings like "ENOENT", "EPERM", etc. in the C locale. This seems to be the only option that wouldn't either moderately increase libc size or require translation files to match the exact current text in the builtin English libc messages. Users who want the current messages would then need an "en" locale with contents like: msgid "ENOENT" msgstr "No such file or directory" If we don't want this, the possible solutions look like one of: 1. Prepending the error code and a null byte (e.g. "ENOENT\0") to all the existing error strings, then skipping past it if the translation was not found. 2. Putting a second version of strerror in locale_map.c with the E* names in it, so it's only linked if you use locale. I strongly dislike this approach because it greatly increases the marginal size cost of doing the right thing (calling setlocale) and imposes the cost even if you don't use strerror at all (only setlocale). 3. Accepting that translations need to match (and perpetually be updated to match) error strings in musl __strerror.h. I don't like this much either. So I think it should be between options 1 and "zero" above. Option zero decreases the size of libc by nearly 1k (removing messages) but changes the behavior. Option 1 increases the size of libc by about 1k. LC_COLLATE No specific proposal yet. We need a data structure to map characters and sequences of characters to collating elements. Obviously the mo file's lookups could be used directly (O(log n), improved avg case if we ever add hash table support) but they might be heavier than we want. The alternative would be having a gigantic string in the mo file that's just "compiled" collation table data, but unless it's well-designed that seems like an undesirable permanent interface boundary.
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux - Powered by OpenVZ