john-users - Re: neural networks

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJ9ii1EsJgtjCuEsV=YB91LFgfD-Q3uu-YFhF_zD3pQsg4eX3g@mail.gmail.com>
Date: Thu, 18 Aug 2016 13:47:08 -0400
From: Matt Weir <cweir@...edu>
To: "john-users@...ts.openwall.com" <john-users@...ts.openwall.com>
Subject: Re: neural networks

I've gone through the paper a couple of times. Haven't looked through the
code though. They way they use John the Ripper is in wordlist mode. They
use the SpiderLabs rules-set for the mangling rules. For the wordlist,
(directly from the paper):

"We explore two different sets of training data. We term the first set the
Password Guessability Service (PGS) training set, used by prior work [89].
It contains the Rockyou [90] and Yahoo! [43] leaked password sets. For
guessing methods that use natural language, it also includes the web2 list
[11], Google web corpus [47], and an inflection dictionary [78]. This set
totals 33 million passwords and 5.9 million natural-language words"

Their Markov modes are using their own tools, not JtR.

The advantage of using a neural network is the small size of the program to
assign a probability to user's passwords. You don't have to include the
full input dictionary, (for example if you were using a reverse-mangling
approach), or a full grammar if you were using a PCFG. Even a full Markov
character set would be much larger if you are training 5 or 6-gram Markov
models. I'd argue that John's incremental training sets are smaller though.

As to the ability to incorporate it in to other password cracking tools or
using it as a stand-alone guess generator, or for the matter how it
performs compared to incremental mode, I'd like to look at the code first
before I start speculating on that ;p

Matt

On Thu, Aug 18, 2016 at 12:30 PM, Solar Designer <solar@...nwall.com> wrote:

> Hi,
>
> This is not an end-user topic yet, because there's no end-user usable
> code yet, and there might not ever be.  But I felt this is of interest
> to the JtR user community anyway, and as we do not dive into source code
> details yet it is not a topic for john-dev yet.
>
> There's interesting new work here:
>
> "Code for cracking passwords with neural networks"
> https://github.com/cupslab/neural_network_cracking
>
> Paper/slides:
>
> https://www.usenix.org/conference/usenixsecurity16/technical-sessions/
> presentation/melicher
>
> The authors include a comparison against JtR and hashcat, but without
> detail on which versions and modes were used.  (I am guessing JtR's
> Markov mode was, but incremental mode was not.  That's unfortunate.)
>
> I only skimmed the paper so far.  In one place, it mentions needing 16
> days to generate 10^10 candidate passwords on a GPU.  This would make
> the approach usable for attacking (semi-)slow hashes, but not fast ones.
>
> I am not convinced there's an improvement over Markov and incremental
> modes here - need independent testing for that - but maybe this is a
> mode that would be reasonable to have alongside other modes we have.
>
> Alexander
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.