musl - Re: Re: a bug in bindtextdomain() and strip '.UTF-8'

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAPG2z0-2iGiVXBhFnaMWF_wVPCs6AgNMqobLJoWHLrmeR=Uy+A@mail.gmail.com>
Date: Mon, 30 Jan 2017 08:37:42 +0800
From: He X <xw897002528@...il.com>
To: musl@...ts.openwall.com
Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8'

> I'm not saying you need to wait....
1. its hard to read that thread for me, i just glanced once, thx for you
advice, ill be more cautious next time! ;p

> Can I ask how .UTF-8 got in the locale name....
2. And '.UTF-8' is copied from glibc's locale-table, i put it there, it's
set by normal user. As i looked in to musl's source, i found it's totally
useless for musl to set such a suffix, suffixes are meaningless. But we
should still do a compatibility with glibc in my view, suffixes seems
already unofficial but standard way to ask libc to provide a proper charset.

> I don't think "it crashes on glibc"...
3. Really sorry, forgot to locale-gen before test, that's why segfault,
seems glibc only stripped '.GBK' at translation load time, showed me
'»ỰѡÏî:'. In another word, it was using real GBK set!

Though I agree with rejection: because musl is utf8, but this '.GBK' asked
for using 'GBK' rather than utf8, conceptually we should just reject it.
But stand on the side of normal users, rewriting is nice to avoid failing.
And for developers using musl, they should know there's no 'non-utf8' sets
in musl rather than depending on libc, so i would like the idea of
rewriting. Or we could put the responsibility of setting right LC_* to
users? Not so friendly...

Because users may want to validate the strings returned by setlocale()...
So the best rewriting time, i think, is at the translation time.

> Re: the original patch, it should probably...
4. makes sense, i'm not a pro coder, i havnt think about using strchr or
strcmp! :)

And with the idea above, i suggest better using strchr to strip all things
after '.'. that is good, and we dont need focus at what is placed after
'.', since whatever he asked, musl is using utf8.

2017-01-30 0:37 GMT+08:00 Rich Felker <dalias@...c.org>:

> On Sun, Jan 29, 2017 at 10:48:34PM +0800, He X wrote:
> > btw, with 'p-> to q->', 'strip .UTF-8'(these two in the first thread),
> and
> > these two patches, fcitx, chromium are working well.
>
> Can I ask how .UTF-8 got in the locale name to begin with? Did you put
> it there, or was it copied from another non-glibc system you logged in
> from, or did chromium itself add it?
>
> Re: the original patch, it should probably (depending on what we want
> to do with other invalid encodings) either use strchr to find the
> first '.' and strip everything after it, or something like:
>
>         if (loclen > 6 && !strcmp(locname+loclen-6, ".UTF-8"))
>
> There's no reason to pull strstr in here.
>
> Rich
>

Content of type "text/html" skipped

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.