Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 3 May 2017 13:53:19 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Issues with iconv conversions (UTF-8 -> cp*)

On Wed, May 03, 2017 at 05:20:47PM +0000, maksis . wrote:
> Hi,
> 
> I’m experiencing issues with iconv conversions from UTF-8 to Windows codepages (cp* -> UTF-8 seems to be working fine).
> 
> 
> Test program: https://gist.github.com/maksis/ef6562b43c94a6a29dc21b987ec4c0cf
> 
> gcc output:  UTF-8 -> cp1250 -> UTF-8: ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~€‚„…†‡‰Š‹ŚŤŽŹ‘’“”•–—™š›śťžźˇ˘Ł¤Ą¦§¨©Ş«¬®Ż°±˛ł´µ¶·¸ąş»Ľ˝ľżŔÁÂĂÄĹĆÇČÉĘËĚÍÎĎĐŃŇÓÔŐÖ×ŘŮÚŰÜÝŢßŕáâăäĺćçčéęëěíîďđńňóôőö÷řůúűüýţ
> 
> musl-gcc output:  UTF-8 -> cp1250 -> UTF-8: ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~€‚„…†‡‰Š‹ŚŤŽŹ‘’“”•–—™š›śťžźˇ˘Ł*Ą****Ş***Ż**˛ł*****ąş*Ľ˝ľżŔ**Ă*ĹĆ*Č*Ę*Ě**ĎĐŃŇ**Ő**ŘŮ*Ű**Ţ*ŕ**ă*ĺć*č*ę*ě**ďđńň**ő**řů*ű**ţ
> 
> Tested with GCC 6.3.1 and musl 1.1.16

Thanks. I've confirmed that this is a bug in conversion to (not from,
only to) legacy 8bit codepages; there's missing logic for the case
(which is handled specially in the tables) where a unicode codepoint
is represented by its own value (that fits in 8 bits). I'll work on a
patch to fix it and follow up when it's either committed or ready for
testing.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.