Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 18 Mar 2017 21:50:28 +0800
From: He X <xw897002528@...il.com>
To: musl@...ts.openwall.com
Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8'

OK, i think there's no further needs of discussion. I got your idea, if
this is what musl want to be. I will try to make patches to vim later!
But for the checking of `charset=`, i can't help, i did not understand
what's up in __mo_lookup(). Hope you can make the patch. The attached has
deleted all things related to drop .charset.

2017-03-18 20:28 GMT+08:00 Rich Felker <dalias@...c.org>:

> On Sat, Mar 18, 2017 at 07:34:58AM +0000, He X wrote:
> > > As discussed on irc, .charset suffixes should be dropped before the
> > loop even begins (never used in pathnames), and they occur before the
> > @mod, not after it, so the logic for dropping them is different.
> >
> > 1. drop .charset: Sorry for proposing it again, i forget this case after
> > around three weeks, as i said before, vim will generate three different
> .mo
> > files with different charset -> zh_CN.UTF-8.po, zh_CN.cp936.po, zh_CN.po.
> > In that case, dropping is to generate a lots of junk.
> >
> > I now found it's not a bug of msgfmt. That is charset is converted by:
> > iconv -f UTF-8 -t cp936 zh_CN.UTF-8.po | sed -e
> > 's/charset=utf-8/charset=gbk/ > ... So that means, charset and pathname
> is
> > decided by softwares, msgfmt does not do charset converting at all, just
> a
> > format-translator. (btw, iconv.c is from alpine)
>
> There are two things you seem to be missing:
>
> 1. musl does not, and won't, support non-UTF-8 locales, so there is no
> point in trying to load translations for them. Moreover, with the
> proposed changes to setlocale/locale_map.c, it will never be possible
> for the locale name to contain a . with anything other than UTF-8 (or,
> for compatibility, some variant like utf8) after it. So I don't see
> how there's any point in iterating and trying with/without .charset
> when the only possibilities are that .charset is blank, .UTF-8, or
> some misspelling of .UTF-8. In the latter case, we'd even have to do
> remapping of the misspellings to avoid having to have multiple
> dirs/symlinks.
>
> 2. From my perspective, msgfmt's production of non-UTF-8 .mo files is
> a bug. Yes the .po file can be something else, but msgfmt should be
> transcoding it at 'compile' time. There's at least one other change
> msgfmt needs for all features to work with musl's gettext -- expansion
> of SYSDEP strings to all their possible format patterns -- so I don't
> think it's a significant additional burden to ensure that the msgfmt
> used on musl-based systems outputs UTF-8.
>
> Of course software trying to do multiple encodings like you described
> will still install duplicate files unless patched, but any of them
> should work as long as msgfmt recoded them. In the mean time, distros
> can just patch the build process for software that's still installing
> non-UTF-8 locale files. AFAIK doing that is not a recommended practice
> even by the GNU gettext project, so the patches might even make it
> upstream.
>
> One thing we could do for robustness is check the .mo header at load
> time and, if it has a charset= specification with something other than
> UTF-8, reject it. I mainly suggest this in case the program is running
> on a non-musl system where a glibc-built version of the same program
> (e.g. vi) with non-UTF-8 .mo files is present and they're using the
> same textdomain dir (actually unlikely since prefix should be
> different). But if we do this it should be a separate patch because
> it's a separate functional change.
>
> Rich
>

Content of type "text/html" skipped

View attachment "locale.diff" of type "text/plain" (3824 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.