Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 16 Dec 2023 18:10:37 -0500
From: Rich Felker <dalias@...c.org>
To: Pablo Correa Gómez <pabloyoyoista@...tmarketos.org>
Cc: musl@...ts.openwall.com,
	Pablo Correa Gómez <ablocorrea@...mail.com>
Subject: Re: [PATCH 0/2] Support printing localized RADIXCHAR

On Sat, Dec 16, 2023 at 08:36:42PM +0100, Pablo Correa Gómez wrote:
> From: Pablo Correa Gómez <ablocorrea@...mail.com>
> 
> Since we've been discussing about translations, I've been looking a bit
> around, and have found some low-hanging fruit, in the form of improving
> printf-family output for localized systems.
> 
> I've tried to do the same for strtof family of functions, but I was not
> completely sure on how to approach that. Forcing the radix char there
> has the problem that numeric values as written for programming stop
> being supported, and treating equally a "." and the localized case seems
> to not be supported by POSIX. Does anybody have any thoughts about this?
> Without that, this patch series might be a bit incomplete, since
> certain localized printf outputs would not be possible to ingest in
> strtof. Although I'm also unequally unsure if that's a requirement
> 
> Pablo Correa Gómez (2):
>   langinfo: add support for LC_NUMERIC translations
>   printf: translate RADIXCHAR for floating-point numbers
> 
>  src/locale/langinfo.c | 2 +-
>  src/stdio/vfprintf.c  | 5 +++--
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> --
> 2.43.0

This is a topic that's been controversial. I have always been against
having variable radix character, but I've also been seeking input from
users who want localized output whether the lack of this functionality
is a serious problem that needs revisiting.

Last time it was discussed, I believe my position was that, if we do
this, it needs to be a 1-bit setting, where a locale necessarily has
either '.' or ',' as the radix. No other values actually appear in
real-world conventions, and on other implementations such as glibc,
the allowance for arbitrary characters allows doing some ~nasty~ stuff
with output and input processing. For example, you could define the
radix character to be '1' or something that makes conversions fail to
round-trip.

As written to support arbitrary radix characters, the patch also fails
to handle the case where the radix character is multi-byte, copying
only a single byte of it and thereby producing broken output. This is
actually a nasty case where printf semantics for field width are not
what the caller is likely to expect, and it breaks our wide printf
implementation, which assumes when it uses byte-based printf for
numbers that the byte count and character count are the same.
Supporting only '.' and ',' avoids all of these issues, too.

Another detail you've overlooked is that scanf/strto{d,ld,f}/atof need
to process the radix point character. This in turn requires making the
_l wrappers for strto{d,ld,f} so that they actually apply the locale
argument rather than ignoring it.

Before proceeding on all of this we should probably try to reach a
decision on whether it's really needed/wanted functionality.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.