Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 14 Apr 2012 00:46:58 +0400
From: Aleksey Cherepanov <aleksey.4erepanov@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: .chr files (Was: automation equipped working place
 of hash cracker, proposal)

On Fri, Apr 13, 2012 at 08:51:31PM +0200, Frank Dittrich wrote:
> On 04/13/2012 08:08 PM, magnum wrote:
> > On 04/13/2012 04:39 PM, Aleksey Cherepanov wrote:
> >> It is common to rebuild chr files to improve incremental mode having some
> >> passwords cracked.
> > 
> > This is common and often very rewarding. What we should not forget
> > though, is that this will emphasize the errors we made in the first
> > case. Suppose we crack 30% of the passwords but for some reason we
> > almost always miss character 'z' (in real life it may be a handful or
> > more of 8-bit or UTF-8 characters) which (very) theoretically could be
> > present in 50% of the total. After rebuilding chr-files we are
> > amplifying this error and will try even fewer (perhaps none) candidates
> > containing character 'z'. And so on.
> 
> Optimal usage of incremental mode indeed is a complex topic.
> 
> Other problems with repeatedly generating new .chr files are:
> 
> 1. If you already used incremental mode with another .chr file for a
> while before you build a new .chr file and restart incremental mode with
> the new file, you'll inevitably try a certain amount of candidate
> passwords again which have already been tried before.
> This is even more the case if you repeatedly recreate new .chr files
> based on passwords cracked previously.

We could filter out candidates that we already tried. Though it does not solve
problem, only reduces it a bit (if effective at all).

> A solution could be to generate a .chr file once after a reasonable
> amount of passwords have been cracked.

I think it is possible to calculate how much is reasonable: first candidates
are closer to pattern than further candidates so success with first candidates
effect regenerated .chr file less than success with successive candidates,
if we could measure effect of each candidate became password and compare total
effect with cost of restart then we could restart precisely. Though
approximate estimation could be enough.

> If you later on generate new .chr files, you can start new incremental
> mode sessions and run them as long as they are effective (due to newly
> discovered important tri-graph character sequences, compared with the
> previously created .chr files.
> But when the new incremental mode session gets less effective, it might
> be better to continue using the older incremental mode session which
> already covered a larger part of the total key space.
> 
> 2. If you detect a pattern like passwords based on dates, e.g.
> 12/10/1989, and you try all candidate passwords of this pattern, you
> should filter out all passwords of this pattern before generating a .chr
> file.
> Otherwise your .chr file will be biased, and password candidates of the
> pattern DD/MM/YYYY or MM/DD/YYYY will become more likely, even if none
> of those passwords will crack any remaining hashes.
> The same applies if someone already completed incremental mode with
> digits.chr.
> In this case, you should filter out all passwords consisting only of
> digits, to avoid a  bias towards digits which will not be justified.

We did this during the contest and practice proved its effectiveness. So to
filter out cracked patterns from further cracking would be nice. But if we
totally drop some pattern from cracking and this was subpattern then it would
be much harder to find real pattern hence proposal not to drop pattern totally
but rather reduce its effects (for instance keeping older increment mode
session in use too like described above). Also it makes the question about
quality of patterns found more important.

> After thinking about it again, this might be at least similar to the
> point magnum made:
> If you do have a bias in the passwords which serve as input for
> generating a .chr file, using this .chr file for incremental mode will
> increase that bias. If this bias doesn't exist in the passwords which
> are still uncracked, you'll have generated a less than optimal .chr file.

With exact pattern we could build .chr for pattern and .chr for remaining
part. With 2 .chr we crack faster because both attacks be more precise. Bias
is inexact pattern. Could we describe that bias and separate it from remaining
part somehow?

Assume that we have mixed passwords of two patterns. We build .chr and
enumerate each password with a number according to its positions in a list of
candidates this .chr file provides. We drop one password from our set and redo
the steps and numbers are changed: if ratio between the biggest group of
password and the smallest group is higher than before then it was a password
from the smallest group else it was a password from the biggest group. I am
not sure how to measure numbers right.

Though I think there could other statistical methods to find groups of
passwords. Something like cluster analysis is going onto mind.

> 3. Generating a .chr file which is appropriate for different hash
> algorithms is very hard.
> Not just because you'll probably have tried a larger set of patterns
> like DD/MM/YYYY or digit-only passwords for fast, saltless hashes, but
> also because password hash algorithms have different properties like
> maximum password length, usage of 8 bit characters, distinction of upper
> and lower case characters.

During the contest we had problem that slowest hash types had very different
patterns at all. So separate generation could be necessary. But we could not
fully avoid following work flow: find pattern on fast hashes and then use it
on slow ones that is probable in real life (and maybe next contests and even
in the previous contest but for slower and not the slowest hashes).

Regards,
Aleksey Cherepanov

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.