Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <29c742fa52652b0b3ba8f4c39b92c97fd04f6117.camel@postmarketos.org>
Date: Fri, 01 Aug 2025 16:24:39 +0200
From: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
To: Rich Felker <dalias@...c.org>
Cc: Thorsten Glaser <tg@...bsd.de>, musl@...ts.openwall.com
Subject: Re: Planned locale work and community thoughts

El vie, 01-08-2025 a las 09:58 -0400, Rich Felker escribió:
> On Fri, Aug 01, 2025 at 11:58:30AM +0200, Pablo Correa Gomez wrote:
> > El mie, 18-06-2025 a las 19:14 -0400, Rich Felker escribió:
> > > On Thu, Jun 19, 2025 at 12:42:50AM +0200, Thorsten Glaser wrote:
> > > > On Wed, 18 Jun 2025, Rich Felker wrote:
> > > > 
> > > > > Theoretically it's possible the textual grep missed things if
> > > > > there is
> > > > > inconsistent json formatting anywhere, so if anyone familiar
> > > > > with
> > > > > jq
> > > > > wants to conduct a search using it instead to confirm, go
> > > > > ahead.
> > > > > I
> > > > 
> > > > My jq-foo is not very good, but I managed this:
> > > > 
> > > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq
> > > > 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")),
> > > > getpath($p).decimal]' | sed 's/">>/>>/' | grep -e '^  "[^.,]"'
> > > > -e
> > > > '^  ".[^"]' | uniq
> > > >   "٫"
> > > > 
> > > > So yes, U+066B is the only other one, and no multi-char ones.
> > > > 
> > > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq
> > > > 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")),
> > > > getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^ 
> > > > "[^.,]"'
> > > > -e '^  ".[^"]'
> > > > 
> > > > … shows all the occurrences, but a quick filter shows that we
> > > > have
> > > > both symbols-numberSystem-arabext and symbols-numberSystem-arab
> > > > but
> > > > assuming both are out of scope…
> > > > 
> > > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq
> > > > 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")),
> > > > getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^ 
> > > > "[^.,]"'
> > > > -e '^  ".[^"]' | fgrep '>>' | fgrep -v -e '.symbols-
> > > > numberSystem-
> > > > arabext"' -e '.symbols-numberSystem-arab"'
> > > >   >>main.bgn-AE.numbers.symbols-numberSystem-latn",
> > > >   >>main.bgn-AF.numbers.symbols-numberSystem-latn",
> > > >   >>main.bgn-IR.numbers.symbols-numberSystem-latn",
> > > >   >>main.bgn-OM.numbers.symbols-numberSystem-latn",
> > > >   >>main.bgn.numbers.symbols-numberSystem-latn",
> > > > 
> > > > … leaves us with this; bgn/numbers.json examplary:
> > > > 
> > > > {
> > > >   "main": {
> > > >     "bgn": {
> > > >       "numbers": {
> > > >         "symbols-numberSystem-arabext": {
> > > >           "decimal": "٫",
> > > >           "group": "٬",
> > > >           "list": "؛",
> > > > …
> > > >         },
> > > >         "symbols-numberSystem-latn": {
> > > >           "decimal": "٫",
> > > >           "group": "،",
> > > >           "list": ";",
> > > > …
> > > > 
> > > > So, if the bgn locales are ever going to be relevant…
> > > > unsure what that exactly is, but my acronyms database says…
> > > >  [ISO 639-3] Western Balochi (cf. bal)
> > > > … which seems to fit.
> > > 
> > > Thanks. My grapping seems to have overlooked that just because it
> > > was
> > > the same character that would normally only be used in an alt-
> > > digits
> > > context. I wonder if the above is intentional or a mistake and if
> > > any
> > > systems are actually doing that.
> > 
> > I've done some research on this topic to see if we could figure out
> > a
> > bit more information. Unfortunately, online resources related to
> > Western Balochi are incredibly sparse:
> > 
> > * glibc: no support at all
> > https://github.com/bminor/glibc/tree/master/localedata/locales
> > * Windows: no support at all
> > https://support.microsoft.com/en-us/windows/language-packs-for-
> > windows-a5094319-a92d-18de-5b53-1cfc697cfca8
> > * Android: no support at all
> > https://android.googlesource.com/platform/frameworks/base/+/android
> > -16.0.0_r1/core/res/res/values/locale_config.xml
> > * Weblate: 3 projects seems to have translations, but on 0%
> > translation: https://hosted.weblate.org/languages/bgn/
> > * iOS: no support in system languages
> > https://www.apple.com/ios/feature-availability/#system-language-
> > system-language
> > or keyboard support
> > https://www.apple.com/ios/feature-availability/#quicktype-keyboard-
> > language-support
> > 
> > In addition, it seems like that data in the CLDR was introduced 10
> > years ago in
> > https://github.com/unicode-
> > org/cldr/commit/a4fe61ea1c1a01e3dfe2545d013ca3289640c81f
> > and never changed since. I've also tried to do some research on
> > whether
> > the data in the CLDR could be an error. The survey for Western
> > Balochi
> > unfortunately shows no votes:
> > https://st.unicode.org/cldr-apps/v#/bgn/Symbols/a1ef41eaeb6982d
> > compared to something like Spanish that has votes from Apple,
> > Microsoft, and Google:
> > https://st.unicode.org/cldr-apps/v#/es/Symbols/4ec3d1b99830ad07
> > 
> > I wonder if it's worth it to bring this to the attention of the
> > unicode
> > consortium to get some clarity on it, or if we just consider this a
> > bug
> > from a language with very  limited digitalization and move on with
> > the
> > assumption of just "." and ",".
> 
> Thanks for the quick research!
> 
> My view is that unless there's an existing strong precedent for this
> convention in digital interfaces, which you seem to have established
> that there's not, we should not pursue supporting it.
> 
> I'm fine with leaving open the possibility in the data format (i.e.
> not just encoding the value in the locale file as 1 bit) so that the
> possibility isn't locked out, but I'm pretty strongly on the side of
> either mapping anything but ',' to '.', or refusing to load locale
> files where the field is neither '.' nor ',' as
> unsupported/malformed.

Seems like the best of both worlds :)

> 
> I just don't see any way to rationalize doing something that likely
> has unforseen security consequences for the sake of a generality that
> no existing users expect (because there's no software that has set
> that expectation).
> 
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.