|
|
Message-ID: <20260403200648.GQ1827@brightrain.aerifal.cx> Date: Fri, 3 Apr 2026 16:06:48 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: iconv GB18030 DoS issue On Fri, Apr 03, 2026 at 12:01:25AM -0400, Rich Felker wrote: > there are still a small number of mappings that are incorrect due to > late changes made in the definition of gb18030, swapping PUA > codepoints with proper Unicode characters. correcting these requires a > postprocessing step that will be added later. To be specific, what we currently implement is the origina 2000 version of GB 18030. Subsequent changes were made in 2005 and 2022, swapping PUA codepoints that were wrongly used for characters that Unicode hadn't assigned yet with the new assignments to put the correct characters in the 2-byte range. I've now validated what's in musl to match the normative GB 18030-2000 xml definition at https://raw.githubusercontent.com/unicode-org/icu-data/tags/tzu-1-4-0/charset/data/xml/gb-18030-2000.xml so applying the swaps to get the 2022 version should be easy if we want to. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.