john-users - Re: Markov phrases in john

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240515215045.GA17357@openwall.com>
Date: Wed, 15 May 2024 23:50:45 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Markov phrases in john

On Wed, May 08, 2024 at 07:12:47AM -0400, Matt Weir wrote:
> First, Mask mode is only in the basic sense a Markov chain. With Hashcat at
> least, it's more letter frequency sorted (vs alphabetical sorted). Aka
> instead of abcdef'. it is aeionr' BUT the order of the second character
> isn't influenced by the first character. I assume JtR is the same way but
> that's probably worth doublechecking as I've certainly been wrong before.

Yes, I wouldn't call our mask mode a Markov chain at all.  Not even
0th-order.  That's because while we do the character frequency sorting,
we not only have no chains from one character to another (could have
been 0th-order, or degraded case of a Markov chain), but we also iterate
the indices in simple positional numerical system order.  This is even
dumber than JtR 1.0's incremental mode, which I do call 0th-order Markov.

I was hoping hashcat's was better than that, is it not?

We have Markov chains or alike in incremental mode and in Markov mode.
Externally, there's also OMEN.

> None of this answers your question though! I'm not aware of a way to
> bootstrap passphrase generation into existing JtR Markov modes due to the
> huge size of the character set. Aka you are looking at an 'alphabet' of 10k
> or more values.

I'm not that familiar with JtR Markov mode's internals (I didn't write
that code), but I am with incremental mode's (which I wrote).  It is
possible to modify the code such that it would handle the much larger
"alphabet", but then to have sane memory requirements we'd need to
reduce the number of conditions or in other words Markov chain order.
Incremental mode is currently like a 2nd-order Markov chain since it
considers 2 preceding characters (works on 3-grams).  However, it also
considers the current length and character position, making it kind of
4th-order.  We can reduce some of this to allow the larger "alphabet".

But I am lazy and I'd like to try my other idea that I mentioned in the
talk and in another message in here earlier today - mapping of frequent
substrings (likely syllables) to/from an "alphabet" small enough to fit
the existing incremental and Markov modes and OMEN and whatever NN-based
models people try.  I think we need to do this anyway, to have another
baseline for comparisons and another stream of candidates that (within a
realistic number of initial candidates) will only partially overlap with
a future full per-word Markov chain's (so will remain useful as well).

> Now if you don't want to do conditional probability and do more 'individual
> words frequency sorted' like mask mode, that is a lot easier to do. I
> wouldn't be surprised if there is an external mode in JtR to do just this
> already.

This wasn't convenient to do from an external mode because of its too
limited interfacing to the rest of JtR.  We offered two kinds of modes:
standalone generators and filters.  The former would have to have the
wordlist embedded in the code, which is cumbersome.  The latter can
produce at most one output word per input word (filtering or modifying
it).  This changed when JimF introduced hybrid external modes in 2016 -
that's two extra callbacks.  So a candidate passphrase generator as an
external mode is more reasonably implementable now, but not done yet.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.