Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 30 Nov 2020 14:14:15 -0300
From: Érico Nogueira <ericonr@...root.org>
To: <musl@...ts.openwall.com>
Cc: "Samuel Holland" <samuel@...lland.org>, "Dong Brett"
 <brett.browning.dong@...il.com>
Subject: Re: Question on C++ locale

On Mon Nov 30, 2020 at 12:35 PM -03, Rich Felker wrote:
> On Mon, Nov 30, 2020 at 12:12:50PM -0300, Érico Nogueira wrote:
> > On Mon Nov 30, 2020 at 11:39 AM -03, Samuel Holland wrote:
> > > On 11/30/20 7:44 AM, Érico Nogueira wrote:
> > > > On Mon Nov 30, 2020 at 8:35 AM -03, Szabolcs Nagy wrote:
> > > >> * Dong Brett <brett.browning.dong@...il.com> [2020-11-30 18:41:33
> > > >> +0800]:
> > > >>> However, the following C++ code does not work (our software uses std::locale in C++ standard library for locale related stuff):
> > > >>> #include <langinfo.h>
> > > >>> #include <locale.h>
> > > >>> #include <locale>
> > > >>> using namespace std;
> > > >>> int main()
> > > >>> {
> > > >>>     std::locale::global(locale(""));
> > > >>>     initscr();
> > > >>>     printw("LC_ALL: %s\n", setlocale(LC_ALL, NULL));
> > > >>>     printw("C++ locale: %s\n", locale().name().c_str());
> > > >>>     printw("CODESET: %s\n", nl_langinfo(CODESET));
> > > >>>     printw("Hello, world!\n");
> > > >>>     printw("你好,世界!\n");
> > > >>>     refresh();
> > > >>>     getch();
> > > >>>     endwin();
> > > >>>     return 0;
> > > >>> }
> > > >>
> > > >> fwiw for me even the first line fails.
> > > >> i don't know how c++ locales are supposed to work.
> > > > 
> > > > From [1], it seems that C++ locales are supposed to affect the global
> > > > locale as well, so they should call setlocale() when appropriate.
> > > > 
> > > > - [1] https://www.cplusplus.com/reference/locale/locale/
> > > > 
> > > > Unfortunately, I assume libstdc++ uses their generic locale support on
> > > > musl...  From gcc-10.2.0/libstdc++-v3/config/locale/generic/c_locale.cc:
> > > > 
> > > >   void
> > > >   locale::facet::_S_create_c_locale(__c_locale& __cloc, const char* __s,
> > > > 				    __c_locale)
> > > >   {
> > > >     // Currently, the generic model only supports the "C" locale.
> > > >     // See http://gcc.gnu.org/ml/libstdc++/2003-02/msg00345.html
> > > >     __cloc = 0;
> > > >     if (strcmp(__s, "C"))
> > > >       __throw_runtime_error(__N("locale::facet::_S_create_c_locale "
> > > > 			    "name not valid"));
> > > >   }
> > > > 
> > >
> > > I don't know for sure that it's the right thing to do, but I have been
> > > patching
> > > out that error for the last several years[1] and so far I have not
> > > noticed any
> > > negative effects. Adelie, which is very thorough about testing, has also
> > > carried
> > > the patch for a while[2].
> > >
> > > Samuel
> > >
> > > [1]:
> > > https://github.com/smaeul/portage/blob/c744774a/patches/sys-devel/gcc/gcc-5.4.0-locale.patch
> > > [2]: https://code.foxkit.us/adelie/packages/-/commit/d09b437d
> > 
> > Are those patches correct in functionality? The GNU version is:
> > 
> >   void
> >   locale::facet::_S_create_c_locale(__c_locale& __cloc, const char* __s,
> > 				    __c_locale __old)
> >   {
> >     __cloc = __newlocale(1 << LC_ALL, __s, __old);
> >     if (!__cloc)
> >       {
> > 	// This named locale is not supported by the underlying OS.
> > 	__throw_runtime_error(__N("locale::facet::_S_create_c_locale "
> > 				  "name not valid"));
> >       }
> >   }
> > 
> > It tries to create a locale object, which the generic code doesn't do.
> > In the generic case, _S_create_c_locale is basically a noop, and I'd
> > assume localization wouldn't work, even if it does avoid the runtime
> > abort.
> > 
> > I will try it out locally when I get the time.
>
> The code there in the GNU version is correct (the one without
> newlocale isn't correct) aside from having the __ prefix, but other
> parts of the GNU version are wrong in that they poke at glibc
> internals to "optimize" useless byte-based ctype functions (useless
> because they can't operate on the only characters whose properties
> could vary by locale, the non-ASCII ones). There should probably be a
> new "posix" directory here based on the GNU one but with all the
> GNUisms removed. If it's not hard to backport that to older GCC
> versions maybe we should do that.

C++ is a bit mysterious to me; do you think there's a chance that
changing the libstdc++ locale implementation could break programs
built for the old version?

I also wonder what the configure script should look for in order to
choose which version to use.

>From a really quick look at _S_create_c_locale, the dragonfly version
might be usable for this purpose, although it uses some non-standard
headers.

>
> One thing: I think in order for std::locale::global to be able to
> work, the locale creation code also needs to store the name (string)
> passed to locale() constructor, since there's no way to setlocale to a
> locale_t. Instead you need to remember the name so you can setlocale()
> to the same name. Perhaps NL_LOCALE_NAME would suffice, but I don't
> think it can easily give the exact same behavior since it's
> per-category.
>
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.