[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Mon, 14 Mar 2011 01:23:51 +0100
From: magnum <rawsmooth@...dband.net>
To: john-users@...ts.openwall.com
Subject: UTF-8 patch
I just uploaded a "UTF-8 awareness" patch to the wiki
(http://openwall.info/wiki/john/patches).
It adds the new option flag --utf8. Without this flag, John behaves as
usual - that is, for any format (for example NT) that internally
converts to Unicode (UTF-16 or UCS-2), the conversion assumes ISO-8859-1
input. This means you can't crack passwords containing characters not
present in ISO-8859-1.
Using this flag makes John assume UTF-8 input instead. That is, you
should feed it with wordlists encoded in UTF-8, and possibly hash files
with user names and info encoded in UTF8, for --single mode to work
best. For unaffected formats, the option is ignored unless you use the
new rejection rules:
Two new rejection rules are introduced:
-u reject rule unless the --utf8 option is used
-U reject rule if the --utf8 option is used
The former can be prepended to rules that are tailored for UTF-8, and
the latter can be used for rules that are specific to ISO-8859-1. For
most other rules, none of them should be used.
The SAPg format do use UTF-8 internally and with this patch you can turn
off the incomplete ISO-8859-1 conversion that is originally used, and
feed it directly with UTF-8.
Other affected formats: mscash, mscash2, mschapv2, mssql, mssql05,
netlmv2, netntlm, netntlmv2. There is also a new format included,
raw-md5-unicode that is md5(unicode($p)) with optional UTF-8 support.
This is somewhat EXPERIMENTAL and I haven't tested it on any other
platform than Linux-x86-64. I know of no bugs though, except for the
following which I believe is not my fault:
The NT format seems to have a bug that make it fail if the second
character of the plaintext is U+2000 or higher (for example a Euro
sign). From all I can tell this is an old bug but we would never trigger
it until now as we could only use U+00FF at most.
You can do "john --test --utf8" to benchmark just the formats that are
affected. Note that we lack UTF-8 / ISO-8859-1 specific self-tests for
some formats, any help adding them would be great.
cheers
magnum
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux -
Powered by OpenVZ