![]() |
|
Message-ID: <29c742fa52652b0b3ba8f4c39b92c97fd04f6117.camel@postmarketos.org> Date: Fri, 01 Aug 2025 16:24:39 +0200 From: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org> To: Rich Felker <dalias@...c.org> Cc: Thorsten Glaser <tg@...bsd.de>, musl@...ts.openwall.com Subject: Re: Planned locale work and community thoughts El vie, 01-08-2025 a las 09:58 -0400, Rich Felker escribió: > On Fri, Aug 01, 2025 at 11:58:30AM +0200, Pablo Correa Gomez wrote: > > El mie, 18-06-2025 a las 19:14 -0400, Rich Felker escribió: > > > On Thu, Jun 19, 2025 at 12:42:50AM +0200, Thorsten Glaser wrote: > > > > On Wed, 18 Jun 2025, Rich Felker wrote: > > > > > > > > > Theoretically it's possible the textual grep missed things if > > > > > there is > > > > > inconsistent json formatting anywhere, so if anyone familiar > > > > > with > > > > > jq > > > > > wants to conduct a search using it instead to confirm, go > > > > > ahead. > > > > > I > > > > > > > > My jq-foo is not very good, but I managed this: > > > > > > > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq > > > > 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), > > > > getpath($p).decimal]' | sed 's/">>/>>/' | grep -e '^ "[^.,]"' > > > > -e > > > > '^ ".[^"]' | uniq > > > > "٫" > > > > > > > > So yes, U+066B is the only other one, and no multi-char ones. > > > > > > > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq > > > > 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), > > > > getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^ > > > > "[^.,]"' > > > > -e '^ ".[^"]' > > > > > > > > … shows all the occurrences, but a quick filter shows that we > > > > have > > > > both symbols-numberSystem-arabext and symbols-numberSystem-arab > > > > but > > > > assuming both are out of scope… > > > > > > > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq > > > > 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), > > > > getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^ > > > > "[^.,]"' > > > > -e '^ ".[^"]' | fgrep '>>' | fgrep -v -e '.symbols- > > > > numberSystem- > > > > arabext"' -e '.symbols-numberSystem-arab"' > > > > >>main.bgn-AE.numbers.symbols-numberSystem-latn", > > > > >>main.bgn-AF.numbers.symbols-numberSystem-latn", > > > > >>main.bgn-IR.numbers.symbols-numberSystem-latn", > > > > >>main.bgn-OM.numbers.symbols-numberSystem-latn", > > > > >>main.bgn.numbers.symbols-numberSystem-latn", > > > > > > > > … leaves us with this; bgn/numbers.json examplary: > > > > > > > > { > > > > "main": { > > > > "bgn": { > > > > "numbers": { > > > > "symbols-numberSystem-arabext": { > > > > "decimal": "٫", > > > > "group": "٬", > > > > "list": "؛", > > > > … > > > > }, > > > > "symbols-numberSystem-latn": { > > > > "decimal": "٫", > > > > "group": "،", > > > > "list": ";", > > > > … > > > > > > > > So, if the bgn locales are ever going to be relevant… > > > > unsure what that exactly is, but my acronyms database says… > > > > [ISO 639-3] Western Balochi (cf. bal) > > > > … which seems to fit. > > > > > > Thanks. My grapping seems to have overlooked that just because it > > > was > > > the same character that would normally only be used in an alt- > > > digits > > > context. I wonder if the above is intentional or a mistake and if > > > any > > > systems are actually doing that. > > > > I've done some research on this topic to see if we could figure out > > a > > bit more information. Unfortunately, online resources related to > > Western Balochi are incredibly sparse: > > > > * glibc: no support at all > > https://github.com/bminor/glibc/tree/master/localedata/locales > > * Windows: no support at all > > https://support.microsoft.com/en-us/windows/language-packs-for- > > windows-a5094319-a92d-18de-5b53-1cfc697cfca8 > > * Android: no support at all > > https://android.googlesource.com/platform/frameworks/base/+/android > > -16.0.0_r1/core/res/res/values/locale_config.xml > > * Weblate: 3 projects seems to have translations, but on 0% > > translation: https://hosted.weblate.org/languages/bgn/ > > * iOS: no support in system languages > > https://www.apple.com/ios/feature-availability/#system-language- > > system-language > > or keyboard support > > https://www.apple.com/ios/feature-availability/#quicktype-keyboard- > > language-support > > > > In addition, it seems like that data in the CLDR was introduced 10 > > years ago in > > https://github.com/unicode- > > org/cldr/commit/a4fe61ea1c1a01e3dfe2545d013ca3289640c81f > > and never changed since. I've also tried to do some research on > > whether > > the data in the CLDR could be an error. The survey for Western > > Balochi > > unfortunately shows no votes: > > https://st.unicode.org/cldr-apps/v#/bgn/Symbols/a1ef41eaeb6982d > > compared to something like Spanish that has votes from Apple, > > Microsoft, and Google: > > https://st.unicode.org/cldr-apps/v#/es/Symbols/4ec3d1b99830ad07 > > > > I wonder if it's worth it to bring this to the attention of the > > unicode > > consortium to get some clarity on it, or if we just consider this a > > bug > > from a language with very limited digitalization and move on with > > the > > assumption of just "." and ",". > > Thanks for the quick research! > > My view is that unless there's an existing strong precedent for this > convention in digital interfaces, which you seem to have established > that there's not, we should not pursue supporting it. > > I'm fine with leaving open the possibility in the data format (i.e. > not just encoding the value in the locale file as 1 bit) so that the > possibility isn't locked out, but I'm pretty strongly on the side of > either mapping anything but ',' to '.', or refusing to load locale > files where the field is neither '.' nor ',' as > unsupported/malformed. Seems like the best of both worlds :) > > I just don't see any way to rationalize doing something that likely > has unforseen security consequences for the sake of a generality that > no existing users expect (because there's no software that has set > that expectation). > > Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.