Date: Mon, 13 Feb 2017 12:08:25 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' On Sun, Feb 12, 2017 at 02:56:53PM +0800, He X wrote: > 1. cat is added to the keys, also do a validate > 2. so we what do we deal with the gettextdir() exactly? inline it or > construct a gettextpointer()? > 3. i added a extra locbuf array, and goto is replaced by a loop, memcpy is > replaced by snprintf, compiled, and working well with fcitx I haven't verified the loop logic yet but on a high level it looks correct. > 4. i just found that i forgot to store the keys to new buffer, it's ok to > just use normal expression? or we need atomic operations? > ``` > + p->cat = category; > + p->binding = q; > + p->lm = lm; > ``` This is fine since the new msgcat is not visible to other threads until it's installed with an atomic, which makes all previous writes visible. I do want to rework this all with a lock structure rather than atomics but that's a separate project. > 5. I do want to rewrite all to .UTF8, but it's a bit annoying as your > words, then i changed the code to simply strip. Since this part is separate and there seems to be disagreement about what it should do, let's separate it from the issue at hand; it's really a separate change from making gettext do proper fallbacks anyway. > > (safe for the user's terminal) > LANG is set by users who are using musl and it's modified to zh_CN at > setlocale(), app will use UTF8 directly, there's no such situation where > charset will cause troubles to users' terminal, except apps which get the > LANG manually by getenv(). I have not seen such strange applications so > far, and most apps only have the UTF8 translation files. > > For moving from glibc to musl, i think doing this way is good for now, we > could delete it later, or just keep it forever. And most people won't use > non-UTF8 at all, if they do use GBK, their app will even fallback to UTF8, > because no translation files for GBK. So, it's not so dagerous, i think :). The main considerations are: 1. what happens when a glibc user ssh's into a musl-based system 2. what happens when a musl user ssh's into a glibc-based system 3. what happens when running musl binaries on a glibc-based system For #1 and #3, it's desirable for musl to accept ".UTF-8" in the locale name, and for #2, users may desire to have ".UTF-8" in their LC_* env vars so that remote glibc programs behave correctly. For #1 and #3, if a glibc uses is using a legacy non-UTF-8 locale and runs a musl program, they're either going to get messed-up output or ASCII-only, depending on decisions we make and/or what their locale value is. These are not really important since legacy encodings are not supported, but it might be nice to make least-bad. If the user has a locale name like "fr_FR" or "zh_CN" that, that's going to be interpreted differently by musl vs glibc; that was already decided a long time ago in the interest of designing around the future rather than broken legacy stuff. But if the locale name is explicitly non-UTF-8 like "zh_CN.GBK", we could opt to reject it without breaking anything, and this may give users better feedback about what's going wrong if they have such settings when ssh'ing into a musl-based system. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.