Openwall Project   /home  Owl  JtR  Pro  crypt  pam_passwdqc  tcb  phpass  scanlogd  popa3d  msulogin  /  Linux  BIND  /  advisories  presentations  /  services  donations  /  wordlists  passwords  /  NEWS  community  lists  Wiki  CVSweb  mirrors  signatures
bringing security into open environments
 
Password Recovery Resources on the Net
[<prev] [next>] [<thread-prev] [month] [year] [list]
Date: Thu, 29 Oct 2009 02:45:33 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: wordlist generation

On Sat, Oct 24, 2009 at 02:43:59AM +0200, SL wrote:
> What is the recommended/preferrable method to convert an arbitrary  
> text file (SQL dump, con-'cat'-enated HTML files, Wikipedia XML  
> export, not a precompiled dictionary) into a (reasonably usable) john  
> wordlist?
> 
> cat $textfile | tr -s -c "[:alpha:]\-??????????????" "\n" | ./unique  
> wordlist.lst
> kind of works, but I wonder if there are better ways?

You're on the right track.

When I need something like this, I generally try to combine several
approaches.  Specifically, I pass the input files through several
different tr's, splitting up "words" on different characters - e.g., in
one of the invocations a dash will be a delimiter, but in another it
will be part of the target "word".

When processing files of a known format, such as SQL dumps, I may also
use "sed" to extract and un-escape the values - e.g., for proper
handling of apostrophes and backslashes embedded into the values vs.
those added for the SQL dump.

Then the resulting stream is passed through "sort -u" or "sort | uniq"
(the standard Unix commands) or "unique" (the program included with
JtR).  The latter tends to be quicker (because it does not need to do
any sorting), but when the input data was not sorted in a meaningful
way, it may be better to have the resulting wordlist sorted
alphabetically as that allows for some optimizations in JtR to work -
detecting effectively-duplicates when the hash type truncates passwords
at a certain length, as well as speeding up DES key setup.  On the other
hand, if the hashes are fast to compute and you do not intend to be
applying plenty of rules to your wordlist, you may choose to save time
on generating the wordlist and use the quicker "unique".

BTW, "unique" can be made even quicker by increasing the values of
UNIQUE_HASH_LOG and UNIQUE_BUFFER_SIZE in params.h.  The defaults are
rather conservative (using around 9 MB of RAM).

Alexander


Powered by Openwall GNU/*/Linux - Powered by OpenVZ