Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 26 Jul 2014 23:27:58 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Call for locales maintainer & contributors

On Sat, Jul 26, 2014 at 11:27:38PM +0200, Wermut wrote:
> Hi
> 
> I don't like the idea of an entirely new tree of locale data written
> from scratch. Glibc has one (with a lot of unmaintained data) and then
> there is also the CLDR repository which aims to be the central source
> for such data, maintained by unicode. The CLDR data is also used as a
> basis for the Microsoft and Apple locale files and is often maintained
> by national language experts. What I could offer is an effort to write
> some magic code that imports the actual CLDR data and converts the
> relevant information to the musl formatted ones. The CLDR data is
> freely available from: http://cldr.unicode.org/index/downloads

I have no objection to using data from CLDR if there's no restrictive
license, but at first glance it looks like most of the data is outside
the scope of the C/POSIX locale system. What we need is:

1. Weekday and month names (full and abbreviated) - these should
   almost certainly be available from CLDR or other public sources.

2. Time format strings for strftime - unless CLDR has C-oriented data
   like that, these might not be available in a form that's easy to
   automatically adapt. Research on this topic is welcome.

3. Regexes for yes and no responses - seems unlikely to be in CLDR,
   but again I'd be happy for someone to prove me wrong.

4. Translations of the message strings in libc. Note that musl's
   strings already deviate some from the legacy strings used on glibc
   and other systems. For example the strerror strings are adjusted to
   align more closely with the POSIX description and the actual
   situations they arise in than the legacy strings (like "Not a
   typewriter"). I'd like to aim to have our translated strings
   equally modernized. And before really spending a lot of work on
   these we should review the English strings again for possible
   improvements and missing messages (I think some newer error codes
   may be missing).

5. Collation rules - these almost certainly can come from Unicode/CLDR
   but musl does not even support collation yet.

6. Monetary formatting and currency names - these almost certain can
   come from CLDR or other public sources, but again the code to use
   the data isn't there yet.

> Contribution is not completely open, but you normally interested
> people get access if they want to. I got mine within a week.
> 
> This is only a suggestion open to discussion. What do you guys think about it?

Overall I like it. But I think we still need a maintainer to manage
pulling the data, maintaining string translations for messages, etc.
Any comments on my items 1-6 above?

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.