![]() |
|
Message-ID: <67d29f815bba91bd7bca96c4308ac2667f77ac82.camel@postmarketos.org> Date: Fri, 01 Aug 2025 11:58:30 +0200 From: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org> To: Rich Felker <dalias@...c.org>, Thorsten Glaser <tg@...bsd.de> Cc: musl@...ts.openwall.com Subject: Re: Planned locale work and community thoughts El mie, 18-06-2025 a las 19:14 -0400, Rich Felker escribió: > On Thu, Jun 19, 2025 at 12:42:50AM +0200, Thorsten Glaser wrote: > > On Wed, 18 Jun 2025, Rich Felker wrote: > > > > > Theoretically it's possible the textual grep missed things if > > > there is > > > inconsistent json formatting anywhere, so if anyone familiar with > > > jq > > > wants to conduct a search using it instead to confirm, go ahead. > > > I > > > > My jq-foo is not very good, but I managed this: > > > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq > > 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), > > getpath($p).decimal]' | sed 's/">>/>>/' | grep -e '^ "[^.,]"' -e > > '^ ".[^"]' | uniq > > "٫" > > > > So yes, U+066B is the only other one, and no multi-char ones. > > > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq > > 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), > > getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^ "[^.,]"' > > -e '^ ".[^"]' > > > > … shows all the occurrences, but a quick filter shows that we have > > both symbols-numberSystem-arabext and symbols-numberSystem-arab but > > assuming both are out of scope… > > > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq > > 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), > > getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^ "[^.,]"' > > -e '^ ".[^"]' | fgrep '>>' | fgrep -v -e '.symbols-numberSystem- > > arabext"' -e '.symbols-numberSystem-arab"' > > >>main.bgn-AE.numbers.symbols-numberSystem-latn", > > >>main.bgn-AF.numbers.symbols-numberSystem-latn", > > >>main.bgn-IR.numbers.symbols-numberSystem-latn", > > >>main.bgn-OM.numbers.symbols-numberSystem-latn", > > >>main.bgn.numbers.symbols-numberSystem-latn", > > > > … leaves us with this; bgn/numbers.json examplary: > > > > { > > "main": { > > "bgn": { > > "numbers": { > > "symbols-numberSystem-arabext": { > > "decimal": "٫", > > "group": "٬", > > "list": "؛", > > … > > }, > > "symbols-numberSystem-latn": { > > "decimal": "٫", > > "group": "،", > > "list": ";", > > … > > > > So, if the bgn locales are ever going to be relevant… > > unsure what that exactly is, but my acronyms database says… > > [ISO 639-3] Western Balochi (cf. bal) > > … which seems to fit. > > Thanks. My grapping seems to have overlooked that just because it was > the same character that would normally only be used in an alt-digits > context. I wonder if the above is intentional or a mistake and if any > systems are actually doing that. I've done some research on this topic to see if we could figure out a bit more information. Unfortunately, online resources related to Western Balochi are incredibly sparse: * glibc: no support at all https://github.com/bminor/glibc/tree/master/localedata/locales * Windows: no support at all https://support.microsoft.com/en-us/windows/language-packs-for-windows-a5094319-a92d-18de-5b53-1cfc697cfca8 * Android: no support at all https://android.googlesource.com/platform/frameworks/base/+/android-16.0.0_r1/core/res/res/values/locale_config.xml * Weblate: 3 projects seems to have translations, but on 0% translation: https://hosted.weblate.org/languages/bgn/ * iOS: no support in system languages https://www.apple.com/ios/feature-availability/#system-language-system-language or keyboard support https://www.apple.com/ios/feature-availability/#quicktype-keyboard-language-support In addition, it seems like that data in the CLDR was introduced 10 years ago in https://github.com/unicode-org/cldr/commit/a4fe61ea1c1a01e3dfe2545d013ca3289640c81f and never changed since. I've also tried to do some research on whether the data in the CLDR could be an error. The survey for Western Balochi unfortunately shows no votes: https://st.unicode.org/cldr-apps/v#/bgn/Symbols/a1ef41eaeb6982d compared to something like Spanish that has votes from Apple, Microsoft, and Google: https://st.unicode.org/cldr-apps/v#/es/Symbols/4ec3d1b99830ad07 I wonder if it's worth it to bring this to the attention of the unicode consortium to get some clarity on it, or if we just consider this a bug from a language with very limited digitalization and move on with the assumption of just "." and ",". Best, Pablo Correa Gomez
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.