Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 07 Aug 2013 15:20:25 +0800
From: Roy <roytam@...il.com>
To: musl@...ts.openwall.com
Subject: Re: Re: Re: Re: iconv Korean and Traditional Chinese research so far

On Wed, 07 Aug 2013 08:54:35 +0800, Roy <roytam@...il.com> wrote:

[snip]
>
> Big5-HKSCS 2004 map for reference:
> http://moztw.org/docs/big5/table/hkscs2004.txt
> Use sed and awk to create b2u.txt for comparing:
> $ sed -e '/^==/d' -e '1,2d' hkscs2004.txt| awk 'BEGIN{print "# big5  
> unicode"}{print "0x" $1 " 0x" $4}' > hkscs2004-b2u.txt
> In result:
> http://roy.dnsd.me/hkscs2004-b2u.txt
>
> And finally the diff:
> http://roy.dnsd.me/uao250-hkscs2004.diff
>
> The diff is huge so separated table is needed.

I forgot that the HKSCS table has original CP950 entries missing.
$ cat cp950-b2u.txt hkscs2004-b2u.txt | sed -e '1d'|sort >  
hkscs2004-big5-b2u.txt

And I wrote a small utility in PHP to compare 2 tables by keys(first  
column):
http://roy.dnsd.me/tbldiff.phps

$ php tbldiff.php uao250-b2u.txt hkscs2004-big5-b2u.txt >  
uao250-vs-hkscs2004.txt
http://roy.dnsd.me/uao250-vs-hkscs2004.txt

$ sed -e '/==/d' uao250-vs-hkscs2004.txt > uao250-hkscs2004-diff.txt
http://roy.dnsd.me/uao250-hkscs2004-diff.txt

So 5965 mappings are different, including 1379 mappings does not exist in  
HKSCS2004.

But since there is mix-usage of HKSCS2001/2004 in both local files and  
Internet pages, the condition of HKSCS become worse.

BTW, There is another NLS hack that hacks MS-CP932 to support JIS X  
0213:2004
http://www.eonet.ne.jp/~kotobukispace/ddt/jisx0213/jisx0213.html

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.