Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 26 Sep 2015 15:35:42 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Re: First feedback on new C locale problems

On Sat, Sep 26, 2015 at 06:58:36AM +0200, Felix Janda wrote:
> On 2015-09-09 05:56:48 GMT, Rich Felker wrote:
> > On Tue, Sep 01, 2015 at 02:32:35AM -0400, Rich Felker wrote:
> > > What I'd like to do to fix it is just always return "UTF-8" for
> > > nl_langinfo(CODESET) regardless of locale (rather than returning
> > > "UTF-8-CODE-UNITS" when in C locale). POSIX places no requirements on
> > > nl_langinfo that would preclude this, and it seems like it would
> > > restore the desired properties and fix all the regressions.
> >
> > Committed.
> >
> > Rich
> 
> GNU sed seems to care about the output from nl_langinfo:
> 
> https://bugs.gentoo.org/show_bug.cgi?id=560728
> 
> More specifically, so does lib/localecharset.c, which is used in
> the replacement of re_compile_pattern.

I was able to reproduce this (with slightly different output, "a© a'")
on Alpine. Clearly this is some sort of bug in the gnulib code or sed
itself, since it's producing corrupt output. I think we should explore
why that's happening and whether it's possible to fix there. But if
there remain other reasons that returning "UTF-8" in the C locale is
not practical then perhaps we could resort to returning "ASCII".

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.