Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 28 Feb 2018 20:13:40 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: setlocale behavior with 'missing' locales

On Wed, Nov 08, 2017 at 12:27:15AM -0500, Rich Felker wrote:
> On Wed, Nov 08, 2017 at 12:03:38AM -0500, Rich Felker wrote:
> > Unfortunately this turns out to have been something of a tradeoff,
> > since there's no way for applications (and, as it turns out,
> > especially tests/test suites) to query whether a particular locale is
> > "really" available. I've been asked to change the behavior to fail on
> > unknown locale names, but of course that's not a working option in
> > light of the above.
> > 
> > I think there may be a solution that makes everyone happy, but I'm not
> > sure yet. I'm going to follow up with a description and analysis of
> > whether it's valid/conforming.
> 
> So here's the possible solution. ISO C leaves the default locale when
> setlocale(cat,"") is called implementation-defined. POSIX however
> defines it in terms of the LANG and LC_* environment variables. See
> the CX text in:
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html
> 
>   "Setting all of the categories of the global locale is similar to
>   successively setting each individual category of the global locale,
>   except that all error checking is done before any actions are
>   performed. To set all the categories of the global locale,
>   setlocale() can be invoked as:
> 
>   setlocale(LC_ALL, "");
> 
>   In this case, setlocale() shall first verify that the values of all
>   the environment variables it needs according to the precedence rules
>   (described in XBD Environment Variables) indicate supported locales.
>   If the value of any of these environment variable searches yields a
>   locale that is not supported (and non-null), setlocale() shall
>   return a null pointer and the global locale shall not be changed. If
>   all environment variables name supported locales, setlocale() shall
>   proceed as if it had been called for each category, using the
>   appropriate value from the associated environment variable or from
>   the implementation-defined default if there is no such value."
> 
> and the Environment Variables text in XBD 8.2:
> 
> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02
> 
> The former seems to tie our hands: unless the locales determined by
> the environment variables all exist, setlocale is required to fail and
> leave us in the (unacceptable) "C" locale where UTF-8 doesn't work.
> However the latter seems to offer us a way out. After describing how
> the precedence of the variables work, how locale pathnames work if
> localedef is supported (musl doesn't support it), and how
> implementation-provided/defined locale names work, it specifies:
> 
>   "If the locale value is not recognized by the implementation, the
>   behavior is unspecified."
> 
> My optimistic reading of this is that, in the event the locale name
> provided does not correspond to something we recognize, we're free to
> define how it's interpreted, and always interpret it as C.UTF-8.
> 
> What this would achieve is the following:
> 
> 1. setlocale(cat, explicit_locale_name) - succeeds if the locale
>    actually has a definition file, fails and returns a null pointer
>    otherwise.
> 
> 2. setlocale(cat, "") - always succeeds, honoring the environment
>    variable for the category if a locale definition file by that name
>    exists, but otherwise (the unspecified behavior) treating it as if
>    it were C.UTF-8.
> 
> This way, applications that probe for specific locale names can do so
> and determine if they exist, but applications that just want to use
> the default locale the user configured will still avoid catastrophic
> breakage (failure to support UTF-8) even if they encounter "bad" LC_*
> variables.
> 
> Does this approach sound acceptable? I'm fairly content with
> interpreting it as conforming to the standard; I'm mainly concerned
> about whether there might be unforseen breakage.
> 
> One notable issue is that, right now, we rely on being able to set
> LC_MESSAGES to an arbitrary name even if there's no libc locale
> definition for it; this is because gettext() relies on the name of the
> current LC_MESSAGES locale to find (application-specific) translation
> files that might exist even without a libc translation. I'm not sure
> how we would best keep this working under changes similar to the
> above.

Any further thoughts on this? I'd like to begin addressing these
issues in this release cycle.

I think the above plan works (is conforming, doesn't break things)
except for the LC_MESSAGES issue mentioned at the end. I don't have
any good ideas still for dealing with that. Really since gettext can
be used with any category, not just LC_MESSAGES (although LC_MESSAGES
is the normal choice), it applies to all categories. Maybe we could
still use the ("nonexistant") requested locale name in this case, or
some derivative of it that clarifies that it's synthesized...?

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.