[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 2 Apr 2010 01:20:08 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: rule and encoding wordlist
On Wed, Mar 31, 2010 at 09:23:47PM +0200, websiteaccess@...il.com wrote:
> I use JTR 1.7.5 with latest patches, os X, terminal is UTF-8.
>
> With following rule (below) and a wordlist (1 word "tro") encoded
> Western (Windows Latin 1) , end of line Windows (CRLF)
>
> >\r[00-9A-C] A\p0[0-9A-D],him, $1
>
> I get
>
> iMac-de-xxx-xx:run xxxxx$ ./john -w:testmot.txt -rules -stdout
> himtro1
> thimro1
> trhimo1
> trohim1
> words: 4 time: 0:00:00:00 100.00% (ETA: Wed Mar 31 21:10:40
Looks good. However, if you actually have any 8-bit character of the
iso-8859-1 encoding in a wordlist entry, then it may/will be displayed
improperly on your UTF-8 terminal, and indeed it will be tested in the
iso-8859-1 encoding against your hashes (which may or may not be what
you want).
> With the same rule but this time my wordlist is unicode UTF-8 , end of
> line Windows (CRLF) or Unix (LF)
>
> I get :
>
> john -w:testmot.txt -rules -stdout
> himtro1
> ?him??tro1
> ?him?tro1
> himtro1
> thimro1
> trhimo1
> trohim1
> words: 7 time: 0:00:00:00 100.00% (ETA: Wed Mar 31 21:12:24 2010)
>
> As you can see some unwanted "?" are now included in the word
> generated.
Did you mean the question mark character, or did some other character
get replaced by a question mark when you sent your message?
Anyhow, JtR does not support multi-byte characters (in fact, it is
unaware of what character encoding your wordlist is in). Most of the
time, this is not a problem for UTF-8, because JtR will simply pass any
UTF-8 characters from a wordlist into its password hashing routines
verbatim (treating the multi-byte characters as multiple single-byte
characters, which works just fine). However, when you try to insert
strings at arbitrary character positions, you have a problem. The rule
you have mentioned may try inserting the string "him" inbetween
individual bytes of a single multi-byte UTF-8 character, thereby
breaking that character.
In your sample JtR output above, the input word appears to be treated as
being 6 single-byte characters long. Since it does not appear to
actually use any multi-byte characters, a guess is that your wordlist
file starts with the BOM character, which is 3 bytes in UTF-8. It is
this character that gets broken by having "him" inserted inbetween its
bytes. This also results in the extra output lines.
> I use very large wordlist (up to 60 gigas) , I can't reencode them.
You also posted another message:
http://www.openwall.com/lists/john-users/2010/04/01/2
stating that you "reencoded file with Unicode (UTF-8, no BOM)". That's
nice, however please be aware that this won't save you from having other
multi-byte UTF-8 characters broken and extra output lines produced by
the rule you mentioned. You might not care, though, because hopefully
there are relatively few wordlist entries with multi-byte UTF-8
characters, and the only impact is having JtR try extra candidate
passwords (it will also generate and try all the correct/intended ones).
A way to avoid this problem would be to have your wordlist in the
iso-8859-1 encoding (which uses single-byte 8-bit characters) and to
have an external filter() convert from iso-8859-1 to UTF-8. The
filter() would be applied after the rules, thereby avoiding the problem
with the lack of UTF-8 support in the rules engine, yet probing UTF-8
encoded strings against your hashes. Someone should write such an
external filter().
Alexander
P.S. I'd like to remind you that whenever you need to post a follow-up
to your own posting, you need to "reply" to that posting (either as you
received it via the list or as you sent it - you do save sent messages
in a "Sent" folder, don't you). Don't post such follow-ups to the list
anew. Doing so starts a new thread in the list archives, which is what
happened this time (you started two separate threads for the same topic).
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux -
Powered by OpenVZ