Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 30 Nov 2020 09:51:26 -0500
From: Rich Felker <dalias@...c.org>
To: Dong Brett <brett.browning.dong@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: Question on C++ locale

On Mon, Nov 30, 2020 at 06:41:33PM +0800, Dong Brett wrote:
> Hi all,
> 
> I am troubleshooting a locale related issue of our C++ software when building with musl. With some efforts I narrowed our problem down to the inability of setting a UTF-8 locale in C++ standard library.
> 
> The following C code prints UTF-8 characters correctly:
> #include <ncurses.h>
> #include <langinfo.h>
> #include <locale.h>
> 
> int main()
> {
>     setlocale(LC_ALL, "");
>     initscr();
>     printw("LC_ALL: %s\n", setlocale(LC_ALL, NULL));
>     printw("CODESET: %s\n", nl_langinfo(CODESET));
>     printw("Hello, world!\n");
>     printw("你好,世界!\n");
>     refresh();
>     getch();
>     endwin();
>     return 0;
> }
> 
> Giving the output of
> LC_ALL: C.UTF-8;C;C;C;C;C
> CODESET: UTF-8
> Hello, world!
> 你好,世界!
> 
> However, the following C++ code does not work (our software uses std::locale in C++ standard library for locale related stuff):
> #include <langinfo.h>
> #include <locale.h>
> #include <locale>
> using namespace std;
> int main()
> {
>     std::locale::global(locale(""));
>     initscr();
>     printw("LC_ALL: %s\n", setlocale(LC_ALL, NULL));
>     printw("C++ locale: %s\n", locale().name().c_str());
>     printw("CODESET: %s\n", nl_langinfo(CODESET));
>     printw("Hello, world!\n");
>     printw("你好,世界!\n");
>     refresh();
>     getch();
>     endwin();
>     return 0;
> }
> 
> Giving a corrupted output:
> LC_ALL: C
> C++ locale: C
> CODESET: ASCII
> Hello, world!
> 你好?~L?~V?~U~L!
> 
> Seems only ASCII C locale is available in C++. If I run the above C++ code with LANG="C.UTF-8", an exception is thrown and the program is aborted:
> terminate called after throwing an instance of 'std::runtime_error'
>   what():  locale::facet::_S_create_c_locale name not valid
> Aborted
> 
> I also tried LANG="UTF-8”, LANG="en_US.UTF-8" but none of those
> works. Only LANG="C" could make the program run but then only ASCII
> characters are supported.
> 
> My question is that is there a way to make locale in C++ standard
> library work with musl? Or had I done anything wrong with it?

Thanks for raising this. Indeed you've uncovered a (pile of) bug(s) in
libstdc++, but they don't seem to be relevant to your usage with
ncurses. Being a C library, not a C++ one, curses behavior depends on
the locale as set through the C/POSIX mechanisms, setlocale and/or
newlocale/uselocale. You shouldn't be using C++'s locale framework for
this. Any program using ncurses should start with either
setlocale(LC_ALL,"") or setlocale(LC_CTYPE,"") (depending on whether
you want the behavior of the other categories).

I'll try to figure out what we need to do to get this fixed in
libstdc++. Since it's never been reported before, I suspect just very
few programs are using the C++ locale API so hopefully at least the
problem is low-impact.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.