john-users - Re: Password datasets with creation rules?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJocqxP+VXC1qoR8wJfZpUmdnFzK8+GwW7FkxsHL6qJC61QHdw@mail.gmail.com>
Date: Sat, 10 Dec 2011 17:27:18 -0600
From: Wesley Tansey <tansey@...utexas.edu>
To: Per Thorsheim <per@...rsheim.net>
Cc: john-users@...ts.openwall.com
Subject: Re: Password datasets with creation rules?

Thanks Per.

>In short: even if you do find any leaks of passwords that are clearly from
environments with creation policies in place (length/complexity), you won't
become much wiser without lots of additional info.

Would you mind expanding on that? I'm not quite as interested in gaining
summary statistics as I am in comparing the performance of a model on it.
I've done a pretty exhaustive search at this point though, so I've kind of
lost hope that I'll find one.

The best I found was the MySpace dataset, which I believe required a
non-alphabetic character, but of course that is very noisy data that
requires filtering since it was retrieved via phishing so only 85% of the
terms actually match that rule due to typos, mixups with a different
password for some other site, etc. It's also a little small (35k after
filtering) and the 6-7 letter passwords aren't as interesting from a
cracking standpoint, so that leaves me with only about 7k. That's making it
a very difficult dataset to work with.

>My presentation at Passwords^11 has some statistics based on environments
where I've had almost complete control of the corporate
environments.

Interesting presentation. Do you have a bibtex reference for it?

Wesley

On Sat, Dec 10, 2011 at 4:33 PM, Per Thorsheim <per@...rsheim.net> wrote:

> On Fri, 2011-12-09 at 18:21 -0600, Wesley Tansey wrote:
> > Does anyone happen to know of any decent-sized, real-world
> leaked/attacked
> > password datasets that are in the wild and employed password creation
> rules
> > such as "must contain a number" or "minimum 8 characters"? Plaintext,
> > hashed, or hashed/salted are all fine as long as I can make a guess
> against
> > each entry and query for its existence in the database. I'm looking for
> > full database releases, not just the cracked ones.
>
> > All of the datasets I've found that have decent sample sizes (rockyou,
> > gawker, phpbb, battlefield heroes beta) seem to have no creation rules
> > enforced.
> >
> > Wesley
>
> I'm tempted to say "It's not that easy". Well, it's not that easy.
>
> Some of the leaks available may have had creation rules, either on
> "paper" or even technically implemented. However they may have changed
> over time, strengthened or weakened... who knows?
>
> At least to me, from pentesting corporate environments, it is very
> common to find written policies that are not technically implemented. Do
> the password cracking, and you'll find passwords that are not in
> compliance with any of the two. This could be due to lazy sysadmins, old
> & unused accounts, frequent changes in password policies etc.
>
> In short: even if you do find any leaks of passwords that are clearly
> from environments with creation policies in place (length/complexity),
> you won't become much wiser without lots of additional info.
>
> My presentation at Passwords^11 has some statistics based on
> environments where I've had almost complete control of the corporate
> environments. You can find it here:
> http://ftp.ii.uib.no/pub/passwords11/presentations/ (PDF, 1.1Mb)
>
> "The Exception" is the only environment I've ever seen where the average
> passwords where "much" longer than the minimum required (length 3, no
> complexity), see page 8. In environments where minimum length is 7+,
> you'll typically see 50% of all acounts having passwords at the minimum
> length.
>
> Pages 13 & 15, based on another data set, also shows of some very common
> patterns from corporate environments in areas of per-position entropy
> (total number of characters used in each position, and the most common
> password formats found in environments with Windows default complexity
> parameters (3 out of 4 character
>
>
>
>
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.