Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 09 Dec 2020 11:35:57 -0300
From: Érico Nogueira <ericonr@...root.org>
To: <musl@...ts.openwall.com>, "Dong Brett" <brett.browning.dong@...il.com>
Subject: Re: Question on C++ locale

On Mon Nov 30, 2020 at 11:51 AM -03, Rich Felker wrote:
> On Mon, Nov 30, 2020 at 06:41:33PM +0800, Dong Brett wrote:
> > Hi all,
> > 
> > I am troubleshooting a locale related issue of our C++ software when building with musl. With some efforts I narrowed our problem down to the inability of setting a UTF-8 locale in C++ standard library.
> > 
> > The following C code prints UTF-8 characters correctly:
> > #include <ncurses.h>
> > #include <langinfo.h>
> > #include <locale.h>
> > 
> > int main()
> > {
> >     setlocale(LC_ALL, "");
> >     initscr();
> >     printw("LC_ALL: %s\n", setlocale(LC_ALL, NULL));
> >     printw("CODESET: %s\n", nl_langinfo(CODESET));
> >     printw("Hello, world!\n");
> >     printw("你好,世界!\n");
> >     refresh();
> >     getch();
> >     endwin();
> >     return 0;
> > }
> > 
> > Giving the output of
> > LC_ALL: C.UTF-8;C;C;C;C;C
> > CODESET: UTF-8
> > Hello, world!
> > 你好,世界!
> > 
> > However, the following C++ code does not work (our software uses std::locale in C++ standard library for locale related stuff):
> > #include <langinfo.h>
> > #include <locale.h>
> > #include <locale>
> > using namespace std;
> > int main()
> > {
> >     std::locale::global(locale(""));
> >     initscr();
> >     printw("LC_ALL: %s\n", setlocale(LC_ALL, NULL));
> >     printw("C++ locale: %s\n", locale().name().c_str());
> >     printw("CODESET: %s\n", nl_langinfo(CODESET));
> >     printw("Hello, world!\n");
> >     printw("你好,世界!\n");
> >     refresh();
> >     getch();
> >     endwin();
> >     return 0;
> > }
> > 
> > Giving a corrupted output:
> > LC_ALL: C
> > C++ locale: C
> > CODESET: ASCII
> > Hello, world!
> > 你好?~L?~V?~U~L!
> > 
> > Seems only ASCII C locale is available in C++. If I run the above C++ code with LANG="C.UTF-8", an exception is thrown and the program is aborted:
> > terminate called after throwing an instance of 'std::runtime_error'
> >   what():  locale::facet::_S_create_c_locale name not valid
> > Aborted
> > 
> > I also tried LANG="UTF-8”, LANG="en_US.UTF-8" but none of those
> > works. Only LANG="C" could make the program run but then only ASCII
> > characters are supported.
> > 
> > My question is that is there a way to make locale in C++ standard
> > library work with musl? Or had I done anything wrong with it?
>
> Thanks for raising this. Indeed you've uncovered a (pile of) bug(s) in
> libstdc++, but they don't seem to be relevant to your usage with
> ncurses. Being a C library, not a C++ one, curses behavior depends on
> the locale as set through the C/POSIX mechanisms, setlocale and/or
> newlocale/uselocale. You shouldn't be using C++'s locale framework for
> this. Any program using ncurses should start with either
> setlocale(LC_ALL,"") or setlocale(LC_CTYPE,"") (depending on whether
> you want the behavior of the other categories).
>
> I'll try to figure out what we need to do to get this fixed in
> libstdc++. Since it's never been reported before, I suspect just very
> few programs are using the C++ locale API so hopefully at least the
> problem is low-impact.

As another data point for an application that uses C++ locales, there is
snapper. From [1]:

    try
    {
	locale::global(locale(""));
    }
    catch (const runtime_error& e)
    {
	cerr << _("Failed to set locale. Fix your system.") << endl;
    }

Fortunately, they have a try-catch around the call, which will also
catch other errors like bad LANG values, if I understand correctly.  I
wonder if other applications that make use of the API usually have this
block, which can mask the error for the user.

That said, I don't think the project can be built on musl without any
external patches yet (some pieces relied heavily on glibc extensions),
so having locale issues isn't the biggest problem with snapper on musl.

- [1] https://github.com/openSUSE/snapper/blob/9e795ed4f0d87e6afcd5065f26c1350942f8ab38/client/snapper.cc#L126

>
> Rich

Érico

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.