john-users - Re: How does incremental mode works?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFMma9P2ZJwRQt6wqBJaZE=-bjpPLqUmNunCScAbLRz8+EyHFw@mail.gmail.com>
Date: Mon, 19 Nov 2012 09:39:50 -0600
From: Richard Miles <richard.k.miles@...glemail.com>
To: john-users@...ts.openwall.com
Cc: magnum <john.magnum@...hmail.com>, simon@...quise.net
Subject: Re: How does incremental mode works?

Hi Simon,

Thanks for your answer, very appreciated. I still have some questions and
suggestion if you don't mind.

On Sat, Nov 17, 2012 at 10:23 AM, Simon Marechal <simon@...quise.net> wrote:

> On 11/16/2012 10:16 PM, Richard Miles wrote:
> > 1) Is there a command-line parameter to replace the default path of
> > $JOHN/markov.stats?
>
> I have not been following what's in jumbo for a while but I suppose
> there is a way in the config file.
>

Sorry, I was not clear on my previous e-mail. You are correct, it's
possible to specify the pass at john.conf - however, I would like to pass
this parameter via command-line.

At documentation and even at config file is described that no command-line
is available, but I'm just curious why not? I mean, I don't believe there
is a technical limitation. Is there a chance to add it to the TODO list?
Magnum does a great job and constantly improve jTr command-line options,
can you consider it please? :)


>
> > 2) How big should be a wordlist to generate a stats file? I mean, the
> > bigger is not always the best, right? Or too short will be bad as well,
> > right? Does the size of the generated stats file influence on the
> attack's
> > time?
>
> I will try to give a high level description of how it works in another
> mail, but will answer this specific question here. The statistical
> generators all behave more or less the same in this regard. Once you
> have generated the stats file from a training set, you will be between
> the following extreme configurations :
> * too little data and you risk "overfitting", that means not having a
> model generic enough to find passwords that differ from the training
> set. For example, if you train it with a single word "aa", the markov

mode will only output candidates with a's in them (not entirely true).
> * with a lot of data, your model will be generic. This is a good
> situation.
> You usually want to train it with as much data as possible, provided
> that this data matches the kind of passwords you are going to attack.
>
>
Make sense, however, what I noticed in practice is that even using the
exact same Markov options and the same target password hashes the time
change too much.

For example:

A) 55 NTLM password hashes with both default stats and stats based on
rockyou.txt with option --markov=240:0:0:13 completes in 16 hours at most
on my computer. The size of stats generated for rockyou.txt is bigger in
comparison with default stats.

B) 55 NTLM password hashes with a stats file based on a really big
dictionary (~50GB) with the same option --markov=240:0:0:13 takes very long
on the same computer. The interesting thing however is that size of stats
is much smaller in comparison with the one generated for rockyou.txt.
Strange, not?

Consequently I believe that really big dictionaries are not a good option
with Markov. If I'm missing something, please, let me know.


>
> > 3) What is the proper kind of wordlist that I should use to generate a
> > stats file? A default one such as passwords.lst? Rockyou leak? PHPbb
> leak?
> > All of them together?
>
> The proper wordlist is the one that looks like the passwords you want to
> attack. If this is a public leak, rockyou is your best choice. If this
> is something else, you will have to find something else ;)
>


I understand, but I have still questions and a suggestion if you don't
mind.

A.1) What is the minimum size (number of words) that a file must have to
produce an effective stats file?

B.1) Once a new password is cracked with Markov should not be useful to get
this information to update the stats file and recalculate the
probabilistic? It may be wrong, but I guess that for example with a good
amount of passwords cracked with Markov if we used this data to modify the
stats "on the fly" it could give better results, not?

C.1) Also, speaking about modify stats file. The stats file that comes with
jTr is not based on rockyou.txt - but it's great. However, the password
list used to generate it is not public available (AFAIK). Is there a chance
to get an existent stats file and read a new dictionary file and use it to
generate an updated stats file that contains the results of the original
stats file previously created and the new wordlist? I think it could be a
very nice feature. :)

D.1) This one is for Magnum again since he always improve jTr with amazing
small features that make our files easier. Today calculate the time that
Markov will run based on time is a bit of pain as described here
http://openwall.info/wiki/john/markov

Do you think that you could add a new command-line option to automate it?
For example, maybe we could do something like
--markov=autoadjust-10800:0:0:13 whre jTr would calculate itself the best
possibility of markov level based on the current password cracking speed of
the target hash and autoadjust it to run during 3 hours (10800). What do
you think?

Thanks a lot and sorry for too many dumb questions / suggestions.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.