Date: Thu, 1 Mar 2018 15:45:08 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: setlocale behavior with 'missing' locales On Thu, Mar 01, 2018 at 02:25:45PM -0500, Rich Felker wrote: > On Thu, Mar 01, 2018 at 01:10:47PM -0600, William Pitcock wrote: > > >> One notable issue is that, right now, we rely on being able to set > > >> LC_MESSAGES to an arbitrary name even if there's no libc locale > > >> definition for it; this is because gettext() relies on the name of the > > >> current LC_MESSAGES locale to find (application-specific) translation > > >> files that might exist even without a libc translation. I'm not sure > > >> how we would best keep this working under changes similar to the > > >> above. > > > > > > Any further thoughts on this? I'd like to begin addressing these > > > issues in this release cycle. > > > > > > I think the above plan works (is conforming, doesn't break things) > > > except for the LC_MESSAGES issue mentioned at the end. I don't have > > > any good ideas still for dealing with that. Really since gettext can > > > be used with any category, not just LC_MESSAGES (although LC_MESSAGES > > > is the normal choice), it applies to all categories. Maybe we could > > > still use the ("nonexistant") requested locale name in this case, or > > > some derivative of it that clarifies that it's synthesized...? > > > > +1 to using this approach. > > > > We could use a locale name such as "en_US@...tual.UTF-8". > > > > glibc uses this style of locale name for locales such as UK english > > with eurozone LC_CURRENCY: en_UK@...o.UTF-8. > > I was actually just in the process of trying to work out something > very similar. Here's how I think it might work: > > setlocale(cat, "") -- always succeeds, produces ll_TT@...tual (or > ll_TT@...sing was my idea) if a locale file by the matching name is > not found. > > setlocale(cat, "ll_TT@...tual") (or whatever name) - always succeeds. > > setlocale(cat, "ll_TT[@other]") - succeeds only if a file matching the > name is found. > > One thing I don't entirely like is repurposing the @ modifier for > this; it conflicts with (and perhaps fails to preserve) an existing > modifier if there is one, and affects how search for gettext > translation files would happen (searching extra @virtual paths). > Perhaps we should instead make it a separate component delimited in > some other way so it can always be dropped by gettext. Implementation notes if we do this: __get_locale is the internal backend that loads locale maps, and looks like the point at which this all should be implemented. Presently __get_locale has no means to return an error; a null return value indicates the C locale, which is represented everywhere by the lack of any locale map. It seems __get_locale has all the information it needs to decide how to act: - If the argument is "", missing/virtual locale synthesis should happen. If allocation failures etc. prevent synthesis, it should behave as if the argument had been "C.UTF-8". - If the argument is one of the builtin locales (C/C.UTF-8/POSIX) it can return one of the builtin maps. Right now it oddly replaces "C.UTF-8" with just plain "C" (null return value) in all categories except LC_CTYPE. This behavior might should be revisited but newlocale.c and perhaps other places encode assumptions that it's done this way. - If the argument is another name that can't be found, an error should be returned to the caller somehow. We could perhaps use MAP_FAILED. The alternative seems to be reworking the contract so that null doesn't mean C and either using a real locale_map object for the C locale or translating to null in the caller, but these choices seem to impose worse costs/effects elsewhere. None of the above covers anything about _how_ the synthesis of names for missing locales should happen, just where/when it should happen. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.