john-users - Re: Re: Passphrase Creation

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ9ii1H3p+UpGN2vYVMKerDW9927nyMoTR1fY5Uhp2r0svCPfA@mail.gmail.com>
Date: Mon, 10 Sep 2012 13:43:47 -0400
From: Matt Weir <cweir@...edu>
To: john-users@...ts.openwall.com
Subject: Re: Re: Passphrase Creation

The two main problems I've run into developing techniques to crack
passphrases is 1) Terminology, 2) Training Sets.

Now terminology might sound silly, but I feel it's a big problem in
the password cracking community, (just see the discussions between JtR
and Hashcat users/developers). When it comes to passphrases in
particular though, I guess my biggest issue is figuring out what
exactly people mean when they mention passphrases? Aka there's many
different passphrase construction techniques. Here's a couple of
examples:

mangydog
mangy dog
goodmagygdog
goodmangydog123
my mangy dog is good!
mmdig!
g00d m@ngy d0g
good mangy dog qwerty123456

And then there's the traditional:
correct horse battery staple
AliceLovesBob

So is "goodmangydog" a passphrase while "mangydog" is not? How about
"mangydog" vs. "mangy dog"? Is a randomly chosen phrase such as
"correct horse battery staple" a passphrase? Etc. What it really comes
down to is our end goal is to crack a password no matter how it's
constructed, ('password123', 'correct horse battery staple',
'1qaz2wsx3edc', or 'my mangy dog is good'), so when we talk about
passphrase cracking what we really mean is we're creating an attack
that targets a specific, (or set of) password/passphrase creation
strategies.

So the next question then is which specific strategies should we be
targeting? Notice how I nicely transitioned into my second problem,
lack of good training sets ;p I can construct all sorts of attacks
against how I think people create passphrases, but if no-one, (or only
a very few people), actually use those techniques then the attacks are
not very useful. As a good example of that, I was once asked how I
would attack ASCII art passwords, (specifically ASCII art passwords
that had a lot of === along with a few B's and D's in them). In
response I created what is quite possibly the largest collection of
one line ASCII art porn on the internet:

https://sites.google.com/site/reusablesec/Home/custom-wordlists/nsfw_ascii_art.txt.gz

A longer blog posting about this is available here:

http://reusablesec.blogspot.com/2009/06/ascii-art-in-password-cracking.html

The thing is, despite what I said in that blog posting, I've since
found very few people actually create passwords that way. Basically
that NSFW dictionary is hilarious, but almost worthless when it comes
to cracking real passwords. So one of the challenges is not only to
identify which passphrase creation strategies people use, but which of
those strategies are used enough that make targeting them worthwhile.
This is where some of our large datasets can be misleading. Sure we
might crack 20 passphrases in the linkedin set using some passphrase
cracking rule, but with 6.4 million total passwords total, does that
mean that particular rule was effective or not? I don't have a good
answer for that question.

BTW, Kzug thanks for the kind words about my old blog. The particular
passphrase dictionary referenced,
(https://sites.google.com/site/reusablesec/Home/custom-wordlists/quote_wordlist_v1.tar.gz),
is very *rough*. There's a lot of artifacts left in it from scraping
wikiquotes, and I left all punctuation/capitalization intact.
Basically I didn't know, (and still don't know) how people created
passphrases, so I figured it'd be easier to clean up punctuation vs
trying to add it back in. If people have suggestions I can go back and
try to update that wordlist and format it a particular way.

Matt
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.