Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250618212359.GB1827@brightrain.aerifal.cx>
Date: Wed, 18 Jun 2025 17:23:59 -0400
From: Rich Felker <dalias@...c.org>
To: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
Cc: musl@...ts.openwall.com
Subject: Re: Planned locale work and community thoughts

On Wed, Jun 18, 2025 at 03:28:47PM -0400, Rich Felker wrote:
> > * Implement RADIXCHAR so that "." is not the only possible separator.
> > THOUSEP will in principle not be implemented due to it breaking quite
> > some assumptions, and it being less critical for users.
> 
> To give some background on this: from the start I was largely opposed
> to having the radix char be localizable at all, as this has been a
> source of perpetual problems for parsing and generating text-based
> data formats intended for interchange, and I didn't really think there
> was any modern demand for it.
> 
> However, in past discussions of the topic, it's come up that some
> people do want it, and I don't want us to be the bad guys who are
> being stubborn dismissing someone else's cultural expectations, so the
> tentative plan has been to offer this with 1-bit degree of freedom
> between '.' and ',' as the only choices.
> 
> I've been made aware that, at least historically prior to use in
> computer systems, there have been other notations for radix point, but
> it's not clear if there's any modern expectation to be able to do
> that. What I think would be a useful next step is to grep the Unicode
> CLDR for whether there are non-'.' non-',' radix chars in any locale
> definitions. If there are none, I think that already settles it. If
> there are any, we should attempt to figure out whether there are
> real-world systems that support them and precedent for users to expect
> they work.
> 
> Note that supporting basically anything plausble other than '.' and
> ',' as radix characters has major technical issues that may introduce
> vulns into programs not expecting it, so in the absence of both strong
> evidence of necessity and research into what would break and whether
> unsafe breakage is unlikely, I want to just say no to this.
> 
> It may however make sense for the on-disk data format to allow for the
> possibility, and for musl to just treat anything but "," as if it were

I've run a textual grep on the data from cldr-47.0.0-json-full.zip:

    grep '"decimal": *"[^,.]"' cldr-numbers-full/main/*/numbers.json

and the only results seem to be for alternative-numerals Arabic
profiles under "symbols-numberSystem-arabext", which is not
used/usable in the C/POSIX locale system.

(There is an alternate symbol, but it's only used with alternate
numeral characters, and C/POSIX can't use alternate numeral characters
in their locale model.)

Theoretically it's possible the textual grep missed things if there is
inconsistent json formatting anywhere, so if anyone familiar with jq
wants to conduct a search using it instead to confirm, go ahead. I
think we're good though.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.