john-users - Re: big wordlists

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090219215520.GA28219@openwall.com>
Date: Fri, 20 Feb 2009 00:55:20 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: big wordlists

On Thu, Feb 19, 2009 at 05:33:02PM +0000, kalgecin@...il.com wrote:
> i've recently generated a 3gig wordlist using john's rules. but john
> couldn't open the wordlist. must i split the file or is there any
> other way?

If you feel you really want to use a wordlist this large, then currently
the best way to do it would be to use a 64-bit build of JtR.  It may be
faster as well.  Of course, your hardware and OS must be 64-bit capable.

32-bit builds of JtR are currently limited to wordlists of up to 2 GB.

On Thu, Feb 19, 2009 at 11:05:40AM -0700, RB wrote:
> Not addressing the size, but the methodology: why?  This is precisely
> why wordlists are implemented the way they are in JtR - you have a
> basic dictionary, then apply mutators on the fly.  The CPU cycles
> spent performing the mutations are miniscule in comparison to the rest
> of the processing, and [likely] orders of magnitude faster than
> waiting on disk I/O for such large lists.
>
> If you need the generated list for other purposes (like feeding to
> another process) that's one thing, but otherwise you're going to
> generally be better off letting JtR do what it's made to do.

That's right, but there is in fact a reason to have JtR pre-apply its
mangling rules to a wordlist, yet use the resulting bigger wordlist with
JtR itself:

Although the mangling rules (in the default ruleset) are designed to
avoid producing duplicate candidate passwords, they nevertheless happen
to produce some with most large wordlists.  This is because it'd take a
lot of processing (and more memory) to completely avoid duplicates, for
all possible input wordlists.  For example, if the input wordlist has
both "word" and "Word", and the ruleset has the following rules (which
are part of the default ruleset):

[List.Rules:Wordlist]
# Try words as they are
:
# Lowercase every pure alphanumeric word
-c >3!?XlQ
# Capitalize every pure alphanumeric word
-c >2(?a!?XcQ

then JtR will try each of "word" and "Word" twice.  Of course,
converting the input wordlist to all-lowercase, as suggested in
doc/MODES, avoids this specific problem, but it may also lose valuable
information (e.g., on entries such as "O'Brien"), and there are more
subtle cases where duplicates may be produced by some combinations of
word mangling rules in the default ruleset and some wordlist entries.

Thus, when cracking slow hashes and/or when the number of different
salts is large, it makes sense to pre-mangle the wordlist and pass the
resulting stream of candidate passwords through "unique" - the program
included with JtR (and in fact linked into the JtR program binary,
making "unique" merely a symlink to "john").  Due to the way "unique"
works, the output has to be saved to a file.  Then the resulting file,
with no duplicates in it, may be used for the slow cracking.

An example on how to invoke "john" and "unique" in this way is given in
doc/EXAMPLES.

Alexander

-- 
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.