Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 17 Dec 2023 10:22:27 -0500
From: Rich Felker <dalias@...c.org>
To: Pablo Correa Gómez <pabloyoyoista@...tmarketos.org>
Cc: musl@...ts.openwall.com,
	Pablo Correa Gómez <ablocorrea@...mail.com>
Subject: Re: [PATCH 0/2] Support printing localized RADIXCHAR

On Sat, Dec 16, 2023 at 06:10:37PM -0500, Rich Felker wrote:
> On Sat, Dec 16, 2023 at 08:36:42PM +0100, Pablo Correa Gómez wrote:
> > From: Pablo Correa Gómez <ablocorrea@...mail.com>
> > 
> > Since we've been discussing about translations, I've been looking a bit
> > around, and have found some low-hanging fruit, in the form of improving
> > printf-family output for localized systems.
> > 
> > I've tried to do the same for strtof family of functions, but I was not
> > completely sure on how to approach that. Forcing the radix char there
> > has the problem that numeric values as written for programming stop
> > being supported, and treating equally a "." and the localized case seems
> > to not be supported by POSIX. Does anybody have any thoughts about this?
> > Without that, this patch series might be a bit incomplete, since
> > certain localized printf outputs would not be possible to ingest in
> > strtof. Although I'm also unequally unsure if that's a requirement
> > 
> > Pablo Correa Gómez (2):
> >   langinfo: add support for LC_NUMERIC translations
> >   printf: translate RADIXCHAR for floating-point numbers
> > 
> >  src/locale/langinfo.c | 2 +-
> >  src/stdio/vfprintf.c  | 5 +++--
> >  2 files changed, 4 insertions(+), 3 deletions(-)
> > 
> > --
> > 2.43.0
> 
> This is a topic that's been controversial. I have always been against
> having variable radix character, but I've also been seeking input from
> users who want localized output whether the lack of this functionality
> is a serious problem that needs revisiting.
> 
> Last time it was discussed, I believe my position was that, if we do
> this, it needs to be a 1-bit setting, where a locale necessarily has
> either '.' or ',' as the radix. No other values actually appear in
> real-world conventions, and on other implementations such as glibc,
> the allowance for arbitrary characters allows doing some ~nasty~ stuff
> with output and input processing. For example, you could define the
> radix character to be '1' or something that makes conversions fail to
> round-trip.
> 
> As written to support arbitrary radix characters, the patch also fails
> to handle the case where the radix character is multi-byte, copying
> only a single byte of it and thereby producing broken output. This is
> actually a nasty case where printf semantics for field width are not
> what the caller is likely to expect, and it breaks our wide printf
> implementation, which assumes when it uses byte-based printf for
> numbers that the byte count and character count are the same.
> Supporting only '.' and ',' avoids all of these issues, too.
> 
> Another detail you've overlooked is that scanf/strto{d,ld,f}/atof need
> to process the radix point character. This in turn requires making the
> _l wrappers for strto{d,ld,f} so that they actually apply the locale
> argument rather than ignoring it.
> 
> Before proceeding on all of this we should probably try to reach a
> decision on whether it's really needed/wanted functionality.

One other small detail that's not important at this point but I want
to make sure isn't forgotten: whatever form the radix point might take
in the locale file, it (and anything else needed programmatically for
internal libc use in hot paths) should be parsed at locale loading
time and stored to a (new) member of struct __locale_map rather than
looking it up each time with LC_TRANS. Since it and other aspects of
LC_NUMERIC (probably none we would support anyway) and LC_MONETARY
(which does have things we should support) need to be reflected in
localeconv() too, probably the whole localeconv struct should live in
struct __locale_map and be initialized at locale load time. I don't
see any other point at which storage for it could be allocated, since
localeconv() is not permitted to fail.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.