Date: Tue, 23 Apr 2013 22:27:09 +0200 From: magnum <john.magnum@...hmail.com> To: "john-dev@...ts.openwall.com" <john-dev@...ts.openwall.com> Subject: office2john.py Unicode parsing Dhiru, Shane, Commit fae8b70 broke unicode support completely for Python 2.7. Files with any non-ascii character in their name (even if Latin-1) b0rked out. Also, even before that commit we never had document info -> GECOS working at all with Unicode. I have fixed this for Python 2.7 (including correctly parsing whatever codepage is used in document info, if not UTF-8) and along the way I could drop a lot of code that was just muting "binary" which was really incorrectly parsed UTF-16. It's possible I broke it for 3.3 again so please test. I committed my changes to bleeding-jumbo only for now. I have added a test file with a unicode filename and that has OLE document properties with Unicode characters to the wiki (http://openwall.info/wiki/john/sample-non-hashes). The correct output should be like this: вα¢к_тσ_єηgℓαη∂.doc:$oldoffice$1*a675a1f0bf660daa552bde6cd36f7f18*727256500c100825b04e6616280dfbf0*104b30d27cc36aad1069e7a377e75126:::¢αη αℓѕσ нανє ℓσηg ¢нαтѕ ωιтн уσυ αвσυт ѕнσєѕ, мαкє υρ αη∂ ¢ℓσтнєѕ! вє¢αυѕє уσυ αяє ιη ℓσνє ωιтн ѕнσρριηg! ι нσρє уσυ ¢σмє вα¢к тσ єηgℓαη∂! 0::/Volumes/jtr16/вα¢к_тσ_єηgℓαη∂.doc (that weird string in the GECOS field can be found in the Rockyou list =) Please test this file on 3.3 and try to fix any issues with it or with my trial'n'error coding. cheers, magnum (now Certified Python dilettante™)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.