passwords - Re: better machine learning for password guesses

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20171028125135.GA27141@openwall.com>
Date: Sat, 28 Oct 2017 14:51:35 +0200
From: Solar Designer <solar@...nwall.com>
To: passwords@...ts.openwall.com
Subject: Re: better machine learning for password guesses

On Sat, Oct 28, 2017 at 02:58:51PM +0300, ArkanoiD wrote:
> Wow, someone should have done it!

And people did.  What's "machine learning" and what's not is fuzzy.  In
a sense, JtR's incremental mode in 1990s was already learning - its .chr
files store statistics from previously-cracked passwords to adjust the
order in which further passwords are tested.

> https://arxiv.org/pdf/1709.00440.pdf

Discussed in some detail in this thread (click "thread-next"):

http://www.openwall.com/lists/john-users/2017/09/26/2

People say this is similar but inferior to earlier CMU work; I didn't
compare these two myself.

> (tl;dr: 18-24% improvement over traditional techniques)

Where "traditional techniques" is what this paper's authors have tried,
not being into password cracking.  Someone who is cracking passwords
daily would do much better by running multiple of those "traditional"
attacks smarter.  Arguably, part of the problem here is that password
cracking tools' defaults alone (e.g., bundled word mangling rule sets)
are not what people who are actually into password cracking use.

The paper doesn't even mention JtR's incremental mode.  Arguably, this
is in part my fault: of course, now that JtR also got a mode literally
called Markov, it diverts the attention of research like this, where
people don't realize that incremental mode is also similarly relevant.
Maybe I should have named it differently.

Also relevant for real-world usage is the amount of time it takes to
generate a guess.

It is quite natural that the authors of a tool would use it more
effectively than they'd use a third-party tool.  Happens all the time.

> But it's quite logical that deep learning should work better at some point
> than careful manual inventorying heuristics used by humans.

The traditional techniques, which I think actually still work better,
are not limited to "careful manual inventorying heuristics used by
humans" - rather, they're a mix of automated analysis/"learning" and
manual work.

If someone using a neural network wins (or gets anywhere close to
winning) one of the many password cracking contests, that would say
something.  However, there's little overlap between the academic
community and the teams for those contests.  Also, while it's certainly
possible that a contest team would use a neural network among other
techniques, at this time I find it unlikely that this would contribute
significantly to the team's overall score (on top of the traditional
techniques they'd also use).

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.