Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 23 Apr 2013 22:27:09 +0200
From: magnum <>
To: "" <>
Subject: Unicode parsing

Dhiru, Shane,

Commit fae8b70 broke unicode support completely for Python 2.7. Files with any non-ascii character in their name (even if Latin-1) b0rked out. Also, even before that commit we never had document info -> GECOS working at all with Unicode.

I have fixed this for Python 2.7 (including correctly parsing whatever codepage is used in document info, if not UTF-8) and along the way I could drop a lot of code that was just muting "binary" which was really incorrectly parsed UTF-16. 
It's possible I broke it for 3.3 again so please test. I committed my changes to bleeding-jumbo only for now.

I have added a test file with a unicode filename and that has OLE document properties with Unicode characters to the wiki ( The correct output should be like this:

вα¢к_тσ_єηgℓαη∂.doc:$oldoffice$1*a675a1f0bf660daa552bde6cd36f7f18*727256500c100825b04e6616280dfbf0*104b30d27cc36aad1069e7a377e75126:::¢αη αℓѕσ нανє ℓσηg ¢нαтѕ ωιтн уσυ αвσυт ѕнσєѕ, мαкє υρ αη∂ ¢ℓσтнєѕ! вє¢αυѕє уσυ αяє ιη ℓσνє ωιтн ѕнσρριηg! ι нσρє уσυ ¢σмє вα¢к тσ єηgℓαη∂! 0::/Volumes/jtr16/вα¢к_тσ_єηgℓαη∂.doc

(that weird string in the GECOS field can be found in the Rockyou list =)

Please test this file on 3.3 and try to fix any issues with it or with my trial'n'error coding.

magnum (now Certified Python dilettante™)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.