Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20140801052953.GA4515@brightrain.aerifal.cx>
Date: Fri, 1 Aug 2014 01:29:53 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Information on locale system in musl 1.1.4

The major new feature in musl 1.1.4 is the locale system. In
accordance with the long-term plan it was based on, it's designed to:

1. Be lightweight -- calling setlocale pulls in around 2k of code when
   static linking on i386.

2. Meet the minimum needs for applications to provide an interface in
   the user's preferred natural language using the official and de
   facto standard interfaces for doing so -- the standard C/POSIX
   locale API and gettext translation API.

3. Avoid complicating the libc or applications that call setlocale in
   ways that impact security, introduce bugs that only occur in
   unusual locales, or discourage developers of light applications
   from calling setlocale.

The version of the locale system in musl 1.1.4 is still incomplete and
experimental. However, its experimental status should not impact use
on existing deployments; locales are not loaded at all unless the
MUSL_LOCPATH environment variable is set.

The features presently supported are:

- The setting of the LC_MESSAGES locale category is recorded
  regardless of whether a libc locale file is available to be loaded.
  This will be used by the gettext interfaces if the application uses
  gettext message translation and can be retrieved by the application
  by calling setlocale(LC_MESSAGES, 0).

- Message translation for most messages produced by libc, including
  error and signal name strings, controlled by LC_MESSAGES.

- Translated day/month names and appropriate date/time format strings,
  controlled by LC_TIME.

The key missing features which will definitely be added at some time
in the future are collation rules (LC_COLLATE) and currency
information and monetary numeric formatting (LC_MONETARY).

Finding locale files:

If the MUSL_LOCPATH environment variable is set, it's treated as a
colon-delimited list of directories to search for locale files. The
locale file must have the exact same name as the locale setting being
requested. Locale names greater than 15 bytes in length, starting with
a '.', or containing the '/' character are rejected.

In the future, musl will probably ignore everything after the dot when
the locale name contains a dot, since by convention this component
reflects a character encoding, whereas musl always uses UTF-8. Other
character may also be rejected in the future; to be safe, locale names
should be restricted to using alphanumeric characters, the underscore,
and the at sign.

In programs running with elevated privileges (setuid/setgid/etc.), the
MUSL_LOCPATH environment variable is not honored. At present, this
means there is no way to use the locale functionality with such
programs. This deficiency will be addressed in a future release.

Unrecognized locale names:

Any locale name that is not usable for any reason (file not found,
name rejected, error loading, etc.) is treated as an alias for the
built-in C.UTF-8 locale. The motivation for this behavior is to avoid
possibly breaking UTF-8 support when the application depends on
setlocale success for UTF-8 to work; this may be a bigger issue in the
future if musl adopts an abstract 8-bit C locale.

Locale file format:

A locale file for use by musl is simply a .mo format file like the
ones used by gettext, and can be created with the msgfmt utility from
the GNU gettext package, gettext-tiny, or possibly other versions.
Translations for message strings and LC_TIME strings (day names, month
names, strftime-style date/time format strings) all go in the same
translation file. The format for monetary and collation data will be
specified at a later time, but will be stored in the same type of
file.

Using gettext:

The gettext translation functions are largely compatible with the
documented interfaces in the GNU gettext manual. This does not include
some more recent, undocumented, ill-designed features in GNU gettext
which are used mostly (only?) by some GNU packages so far. The main
deviation from GNU gettext in the outward behavior is that the
LANGUAGE environment variable is not honored; that topic is covered in
a separate message to the musl list. Also, there is no default path
for translation files, but this should not affect applications since
the documented usage is that calling bindtextdomain is required.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.