john-users - Re: how charset are made ?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090725212325.GB10716@openwall.com>
Date: Sun, 26 Jul 2009 01:23:25 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: how charset are made ?

On Sat, Jul 25, 2009 at 09:49:20PM +0200, websiteaccess wrote:
>  I have generated my own alnum.chr charset (from a cracked password 
> dico), with 200 000 words.

It is not clear where those 200k "words" came from - were all of them
real passwords?  Was it possible for the same "word" to occur more than
once (such as if it were a common password)?  If you were cracking
saltless hashes, then probably you'd only get one instance of each
password, even if it matched multiple hashes...

>  I did a test :
> 
>  1 - original JTR's charset (alnum)
>  2 - my charset (alnum)
> 
>  Original charset are at least 3 times faster to find plaintext ! the 
> word was easy "test620"
> 
>  How do you explain that ? :-/

1. The supplied .chr files are fairly good. :-)  Processing the source
material (including rejecting some of it) to generate the supplied .chr
files involved quite some effort.

2. You need to do out-of-sample testing.  Pick two non-overlapping sets
of hashes.  Use the cracked passwords for one of the sets to generate a
.chr file.  Use the hashes from the other set to test efficiency of the
generated .chr file vs. the "corresponding" one supplied with JtR.  The
two sets don't need to be of equal size - you may well use 190,000 of
cracked passwords to generate a .chr file and another 10,000 hashes to
test its efficiency.

3. Time to crack a single hash is not of statistical significance.  You
need to run JtR on a large enough test sample - say, 10,000 hashes - and
see how many get cracked after 1 minute, 1 hour, 1 day with each of the
.chr files you test separately.

4. Another important test is to repeat #3 after having run "single
crack" and wordlist.  It is possible that one .chr file will perform
better "from scratch", but another will perform better after "single
crack" and wordlist.  The latter will likely be of more use in practice
(because you'd get a larger percentage of hashes cracked total - for all
three cracking modes combined).  So this the case I've been optimizing a
few things for.

>  Is John build a charset based on words statitics ?

This is not a very specific question, so I can't answer it directly.

However, I can say that, yes, statistical information is collected and
saved in .chr files.  It does not include statistics on entire words
(except unintentionally in some rare special cases), but it includes
lists of characters sorted by their estimated probabilities (derived
from the numbers of occurrences) for a given length, position, and two
preceding characters.  So it can be said that indirectly .chr files
include character triplet statistics (separately for each password
length and starting position of the triplet).

Alexander

-- 
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.