Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d3b94e19-7754-48f0-60f7-28ae19009d3f@mirbsd.de>
Date: Thu, 19 Jun 2025 00:42:50 +0200 (CEST)
From: Thorsten Glaser <tg@...bsd.de>
To: musl@...ts.openwall.com
cc: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
Subject: Re: Planned locale work and community thoughts

On Wed, 18 Jun 2025, Rich Felker wrote:

>Theoretically it's possible the textual grep missed things if there is
>inconsistent json formatting anywhere, so if anyone familiar with jq
>wants to conduct a search using it instead to confirm, go ahead. I

My jq-foo is not very good, but I managed this:

tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -e '^  "[^.,]"' -e '^  ".[^"]' | uniq
  "٫"

So yes, U+066B is the only other one, and no multi-char ones.

tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^  "[^.,]"' -e '^  ".[^"]'

… shows all the occurrences, but a quick filter shows that we have
both symbols-numberSystem-arabext and symbols-numberSystem-arab but
assuming both are out of scope…

tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^  "[^.,]"' -e '^  ".[^"]' | fgrep '>>' | fgrep -v -e '.symbols-numberSystem-arabext"' -e '.symbols-numberSystem-arab"'
  >>main.bgn-AE.numbers.symbols-numberSystem-latn",
  >>main.bgn-AF.numbers.symbols-numberSystem-latn",
  >>main.bgn-IR.numbers.symbols-numberSystem-latn",
  >>main.bgn-OM.numbers.symbols-numberSystem-latn",
  >>main.bgn.numbers.symbols-numberSystem-latn",

… leaves us with this; bgn/numbers.json examplary:

{
  "main": {
    "bgn": {
      "numbers": {
        "symbols-numberSystem-arabext": {
          "decimal": "٫",
          "group": "٬",
          "list": "؛",
…
        },
        "symbols-numberSystem-latn": {
          "decimal": "٫",
          "group": "،",
          "list": ";",
…

So, if the bgn locales are ever going to be relevant…
unsure what that exactly is, but my acronyms database says…
	[ISO 639-3] Western Balochi (cf. bal)
… which seems to fit.

bye,
//mirabilos
-- 
<ch> you introduced a merge commit        │<mika> % g rebase -i HEAD^^
<mika> sorry, no idea and rebasing just fscked │<mika> Segmentation
<ch> should have cloned into a clean repo      │  fault (core dumped)
<ch> if I rebase that now, it's really ugh     │<mika:#grml> wuahhhhhh

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.