Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 11 Sep 2019 07:44:37 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: printf doesn't respect locale

On Wed, Sep 11, 2019 at 12:07:22PM +0200, Jens Gustedt wrote:
> Hello Szabolcs,
> 
> On Wed, 11 Sep 2019 12:01:59 +0200 Szabolcs Nagy <nsz@...t70.net> wrote:
> 
> > > We would be *extremely* disappointed if LC_NUMERIC would never be
> > > supported in upstream musl.  We would have to maintain a patch to
> > > add LC_NUMERIC support when the rest of musl's locale support is
> > > developed.  
> > 
> > i consider this a posix/iso c bug.
> 
> I agree
> 
> > there is a need for printf with fixed C.UTF-8 locale in
> > library code that implements a file format, language or
> > protocol that cannot be locale dependent.
> > 
> > in iso c there is no way to get this.
> > 
> > in posix 2008 you have to jump through very bizarre hoops
> > to get it (in a slow and resource wasting way).
> > 
> > so the world is full of printf users that just expect
> > fixed C.UTF-8 locale and hope nobody calls setlocale.
> > 
> > telling ppl that their code is wrong does not help unless
> > you provide an alternative, but introducing new api for
> > this would not be portable.
> 
> I think that WG14 would be happy to hear any suggestions how we could
> get out of this trap, a proposal for C2x would even be better.

The obvious solution is a modifier character to printf/scanf format
strings that applies to numeric conversions and means "always
format/interpret this as if in the C locale". However this is hard to
test for at build time unless there's a macro declaring its
availability, so ideally WG14 would also adopt the sort of
fine-grained feature availability macros some of us have been
proposing for extensions.

An alternative/additional solution, which I actually might like
better, is having a function which sets a thread-local flag to treat
certain locale properties (at least the problematic LC_NUMERIC ones)
as if the current locale were "C". This is weaker than the uselocale
API from POSIX, but doesn't have the problems with the possibility of
failure (likely with no way to make forward progress) like it does,
and more importantly, would avoid *breaking* m17n/i18n functionality
by turning off other unrelated, non-problematic locale features.
Application or library code could then just set/restore this flag
around *printf/*scanf/strto*/etc calls, or could set it and leave it
if they never want to see ',' again.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.