john-users - Re: Markov Sampling

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b0197463-8f60-c124-98e3-d07787c0d792@matlink.fr>
Date: Thu, 15 Feb 2018 15:10:28 +0100
From: Matlink <matlink@...link.fr>
To: john-users@...ts.openwall.com
Subject: Re: Markov Sampling

I am interested by knowing how you calculate the level required to
generate plaintext passwords (as you explain in 1)).


Le 13/02/2018 à 19:16, Matt Weir a écrit :
> So my initial reaction was that I'm doubtful printing out 1/10 of all
> the password guesses will give you the end results you are looking
> for. Here are a couple of other options:
>
> 1) You can always work backwards if your target plaintexts are known.
> Aka calculate the level required to generate them, (that's easy to do.
> Sum up all of the transitions based on your character file). Once you
> have that, calculate how many guesses are required to get to that
> level which the included tools in JtR will do for you. If you need
> additional help on this approach I can write up something more
> detailed. This is *super* fast and gives you a pretty good
> approximation.
>
> 2) You can rely on JtR's own logging to see when passwords would be
> cracked so that way you are not slowed down by piping the output to
> stdout or using a 3rd party tool to figure out how many passwords you
> cracked. If you want to go this route I'm sure the people on this list
> can help out.
>
> There's other options as well I guess. You could always modify the
> Markov code directly in JtR. If you want other examples of the Markov
> code, I re-implemented it in Python for my PCFG cracker. You can see
> the guess generation code at:
>
>  https://github.com/lakiw/pcfg_cracker/blob/master/python_pcfg_cracker/pcfg_manager/markov_cracker.py
>
> Of course, my code is slow so that almost certainly is not the way to
> go, but it may be useful as a reference. Long story short, if you let
> us know more what you are trying to do I'm sure we can brainstorm some
> better options.
>
> Matt
>
>
>
> On Tue, Feb 13, 2018 at 11:47 AM, Solar Designer <solar@...nwall.com> wrote:
>> On Tue, Feb 13, 2018 at 04:44:03PM +0100, Matlink wrote:
>>>> The pre-defined --external=Parallel mode will do what you ask for.
>>>> You'll just need to customize the "node" and "total" numbers in its
>>>> init() in john.conf.
>>> Well, I guess it's only 'not printing' generated candidates? Does it
>>> really speed up the process, since generating a password candidate is
>>> more costly than printing it?
>> It doesn't speed up the processing inside JtR; it actually adds extra
>> processing.
>>
>>> Concretely, is --markov --stdout --external=Parallel with node 1/100,
>>> 100 times faster than with node 1/1?
>> No.  It's probably roughly same speed: the external mode adds overhead
>> internally to JtR, but then those skipped candidates don't need to be
>> printed to the Unix pipe.
>>
>>>> However, note that "every 10th" doesn't necessarily produce a
>>>> representative sample: the underlying cracking mode (in this case,
>>>> Markov) might happen to have some periodicity in its output, and one of
>>>> its period lengths might just happen to be a multiple of 10 or whatever.
>>>> So ideally you'd want to randomize the order (if the order somehow
>>>> doesn't matter for your research) over a larger number of candidate
>>>> passwords - say, pass a million of them through GNU coreutils' shuf(1) -
>>>> and then take every 10th out of that randomized list.
>>> My issue is that I can't get the whole output because it is too costly
>>> for me to gather them due to UNIX pipe. I would like to my
>>>
>>>     john --stdout --markov --sample=100 | my_sublime_post-process
>>>
>>> be somewhat 100 times faster than
>>>
>>>     john --stdout --markov --sample=1 | my_sublime_post-process
>> You could use the built-in --node=1/100 feature, which probably will
>> speed things up a lot, but then it almost certainly doesn't result in a
>> representative sample - it's just a way to split the work between
>> multiple nodes, without regard as to whether each node would get a
>> representative sample and be expected to crack a similar percentage of
>> real-world passwords that other nodes crack or not (so this probably
>> won't be the case, making this approach unsuitable for use in research).
>>
>> The same applies to incremental mode.
>>
>>> Your solution requires to get the whole output of john and then
>>> post-process it, but I can't find a satisfiable way to get its whole
>>> output (since john is really fast to generate candidates).
>> A question is whether you actually need to get this many candidates (or
>> a sample from this many), or whether fewer would suffice.  That depends
>> on what your ultimate goal is.
>>
>> Alexander

-- 
Matlink - Sysadmin matlink.fr
Sortez couverts, chiffrez vos mails : https://café-vie-privée.fr/
XMPP/Jabber : matlink@...link.fr
Clé publique PGP : 0x186BB3CA
Empreinte Off-the-record : 572174BF 6983EA74 91417CA7 705ED899 DE9D05B2
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.