john-users - Re: Markov phrases in john

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 8 May 2024 07:12:47 -0400
From: Matt Weir <cweir@...edu>
To: "john-users@...ts.openwall.com" <john-users@...ts.openwall.com>
Subject: Re: Markov phrases in john

I’m typing this response on my phone so my apologies as this great question
deserves a much longer response.

First, Mask mode is only in the basic sense a Markov chain. With Hashcat at
least, it’s more letter frequency sorted (vs alphabetical sorted). Aka
instead of abcdef…. it is aeionr… BUT the order of the second character
isn’t influenced by the first character. I assume JtR is the same way but
that’s probably worth doublechecking as I’ve certainly been wrong before.

The reason I make this nitpicky distinction is adding in conditional
probability between terminals (letters/words/etc) adds a lot of complexity
in how it is trained and guesses are generated. There is a reason why JtR
has two completely different Markov modes (—Markov and —Single) that behave
very differently.

None of this answers your question though! I’m not aware of a way to
bootstrap passphrase generation into existing JtR Markov modes due to the
huge size of the character set. Aka you are looking at an “alphabet” of 10k
or more values. But there are a lot of open source options you can pipe
into JtR via —stdin or —pipe mode. The problem is most open source examples
don’t natively generate guesses in some form of probability order. So they
require some additional frameworks to go through their guess generation in
a manner that fits into a password cracking session.

Now if you don’t want to do conditional probability and do more “individual
words frequency sorted” like mask mode, that is a lot easier to do. I
wouldn’t be surprised if there is an external mode in JtR to do just this
already.

Cheers,
Matt/lakiw

On Wednesday, May 8, 2024, Albert Veli <albert.veli@...il.com> wrote:

> Hi, as many of you know a mask will not try combinations of characters
> in alphabetical order but rather in the most likely to least likely order
> using something like Markov chains:
>
> ./john --stdout --mask='?l?l'
> aa
> ea
> ia
> oa
> na
> ra
> la
> sa
> ...
>
>
> This is useful to find human-created passwords early. Nowadays it is more
> and more popular to use combinations of words to create passwords. Would
> it be possible to use Markov or similar to traverse entire words from a
> wordlist and use the most common pair of adjacent words from the list
> first, then the second most common and so on?
>
> Like Markov does for individual characters, but on entire words instead?
> I hope you understand what I mean. Then maybe extend this to three
> words. It is possible with the '?l?l?l' mask so in some way it should be
> possible to do for entire words too. Ideally there would be an option to
> specify word delimiter too. Maybe even an option to provide a corpus text
> to train the chains on. Then an option to specify how many words to
> include in the guesses, the top 100 words, the top 500 words or the top
> 2000 words and so on. For two word combinations you can use a larger
> number and for three or four words, smaller numbers.
>
> What do you think? Would this be useful, or is it possible now already?
>
>
> Regards,
>
> Albert
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.