Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 21 Nov 2017 22:25:24 -0500
From: Rich Felker <>
Subject: Re: cp437 issue with bad mapping at least for one char

On Wed, Nov 22, 2017 at 03:50:48AM +0100, Jacob Thrane Lund wrote:
> Hi musl devs,
> I experienced a test failing when building the latest version of gammu for Alpine Linux.
> After reporting the issue to the gammu developer the reached conclusion was the issue is with musl -
> I have checked the log for
> and Rich Felker pushed a commit 8 days ago. As of yet I have not had
> the chance to verify if this also resolves this issue. Dealing with
> charsets at this level is for me totally new territory..
> I was hoping you could confirm/deny if Rich’s commit indeed also resolves my issue?

It does. Here is how CP437 decodes, before:

Çüéâäàåç êëèïîìÄÅ ÉæÆôöòûù ÿÖÜ¢£¥₧ƒ  ¡¢£¤¥¦§ ¨©ª«¬­®¯ ░▒▓│┤╡╢╖ ╕╣║╗╝╜╛┐
└┴┬├─┼╞╟ ╚╔╩╦╠═╬╧ ╨╤╥╙╘╒╓╫ ╪┘┌█▄▌▐▀ αáΓπΣσæτ ΦΘΩδìφεï ðñ≥≤⌠⌡÷≈ °∙·√ü²■ 

and after:

Çüéâäàåç êëèïîìÄÅ ÉæÆôöòûù ÿÖÜ¢£¥₧ƒ áíóúñѪº ¿⌐¬½¼¡«» ░▒▓│┤╡╢╖ ╕╣║╗╝╜╛┐
└┴┬├─┼╞╟ ╚╔╩╦╠═╬╧ ╨╤╥╙╘╒╓╫ ╪┘┌█▄▌▐▀ αßΓπΣσµτ ΦΘΩδ∞φε∩ ≡±≥≤⌠⌡÷≈ °∙·√ⁿ²■ 

The problem (silently fixed) was that the table generation code for
legacychars.h ignored entries in the Unicode charmap files that used
lowercase a-f in the hex, _and_ omitted characters that appeared in
the same slot as their Unicode codepoint (in all the ISO-8859
encodings containing í, it appears in "its own" slot), since these
previously got a special encoding. If not for the latter, this
character would have been included in the legacychars.h map already
due to being in Latin-1, where the charmap file used uppercase.

Somehow when the character was missing in legacychars.h, the mapping
tables ended up containing nonsense.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.