Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 27 Jun 2013 02:56:43 +0800
From: orc <orc@...server.ru>
To: musl@...ts.openwall.com
Subject: Re: Iconv and old codepages

Thanks Rich for your quick answer!

On Wed, 26 Jun 2013 14:34:32 -0400
Rich Felker <dalias@...ifal.cx> wrote:

> On Thu, Jun 27, 2013 at 02:15:39AM +0800, orc wrote:
> > Hi,
> > 
> > How many codepages does in-musl iconv supports?
> > Currently I'm trying converting from "utf8" to "cp1251" and iconv()
> > only gives me a number of "*"'s matching the utf8 input. Is this
> > correct behavior and iconv() currently does not support non-UTF
> > legacy codepages? Even so, I still see many of them in
> > src/locale/codepages.h The (dirty) test program attached.
> > 
> > I also noticed alternative libs thread and corresponding wiki page.
> > Does someone know lightweight iconv replacement as a temporary
> > measure (other than libiconv for example)?
> 
> Should be fixed in git. In general, the state of musl's iconv is that
> the following charsets are supported:
> 
> utf8
> wchart
> ucs2
> ucs2be
> ucs2le
> utf16
> utf16be
> utf16le
> ucs4
> ucs4be
> utf32
> utf32be
> ucs4le
> utf32le
> ascii
> usascii
> iso646
> iso646us
> eucjp
> shiftjis
> sjis
> gb18030
> gbk
> gb2312
> iso88591
> latin1
> iso88592
> iso88593
> iso88594
> iso88595
> iso88596
> iso88597
> iso88598
> iso88599
> iso885910
> iso885911
> tis620
> iso885913
> iso885914
> iso885915
> latin9
> iso885916
> cp1250
> windows1250
> cp1251
> windows1251
> cp1252
> windows1252
> cp1253
> windows1253
> cp1254
> windows1254
> cp1255
> windows1255
> cp1256
> windows1256
> cp1257
> windows1257
> cp1258
> windows1258
> koi8r
> koi8u

So "most major encodings", yep.
Thanks, it is fixed and works now.

> 
> Non-alphanumeric characters are ignored in matching charset names, so
> all combinations of hyphens and underscores are also supported with
> these.
> 
> One caveat which should not affect your usage is that the following
> charsets are only supported as the "from" charset, not the "to"
> charset:
> 
> eucjp
> shiftjis
> sjis
> gb18030
> gbk
> gb2312
> 
> Until the latest commit, the legacy 8bit codepages were also broken as
> the "to" charset, but this breakage was unintentional.

While digging trough code I did not noticed that too.

> 
> 
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.