john-users - Re: Anyone looked at the Ashley Madison data yet?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150826031555.GA2082@openwall.com>
Date: Wed, 26 Aug 2015 06:15:56 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Anyone looked at the Ashley Madison data yet?

On Tue, Aug 25, 2015 at 04:27:37PM -0400, KZug wrote:
> Due to the extremely sloooow Bcrypt speed of the A.Madison dump,  I don't see a point at each of us working in its own corner.
> If you are willing to participate, I can open a Dropbox folder and share it. (hashes recovered, etc.)
> By doing so, we don't have to reinvent the wheel each time, and can save few CPU cycles.

I suggest that, if you do this, you all start by defining your goals,
and this will affect whether and how you work together (or at all).

First, obviously you're doing this for research (and not e.g. for having
anyone's account anywhere compromised to any greater extent).  What kind
of research is that?  What would you like to find out, and why?

Is this to get as many passwords cracked as you can, and then state this
figure - e.g., "0.1% of the bcrypt hashes in the dump cracked in 7 days"
(totally arbitrary figures, but these feel realistic to me based on what
was said so far)?  So that e.g. academic publications on password security
have some figure to refer to for the case of very slow salted hashes
without a password policy and with related information available for
each account in a multi-million password hash dump (these are the factors
that I think are primarily determining the success rate).  If so, you
may accept contributions from about anyone.

Is this to create a "top N" list for this leak, to have one more of
those to refer to e.g. in academic publications on password security?
(I doubt the list would be of great other use.  We have many of those
already.  It might be usable to adjust the rankings in a cumulative
"top N" list across many leaks, though.)  If so, you need to define the
methodology first, or the resulting list would likely be badly skewed in
unknown ways.  You can't blindly accept arbitrary contributions
(there are ways to make partial use of those yet avoid biases, but this
is not trivial so it will likely go wrong).

For the "top N" work, you need to "shuf" the dump and choose specific
e.g. 100k lines from it (e.g. for intending to produce a top 100 list).
To make this even safer, "shuf" the 100k sub-list of hashes for each
potential contributor separately, and give each contributor only their
shuffled list.  This extra measure is in case of interrupted attacks, so
that with a large number of contributors the original 100k list is
attacked uniformly anyway.  (It wouldn't be fatal even if it's not,
though, since it's already shuffled.  However, if a particularly common
password is found closer to the start of the 100k list, it might appear
as even more common than it actually is if some attacks are interrupted.)

Also, to avoid increasing the damage of this leak you should keep your
cracked password list such that it's not valuable to attackers.
Focusing e.g. on a 100k sub-list and creating a top 100 list from it
will likely make your cracked passwords less valuable for possible
misuse as compared to attacking the entire dump and maximizing the
number of hashes cracked through the accounts' related info (the GECOS
alikes).  You will probably produce way fewer cracked passwords when
processing only a relatively small sub-list.  For a number of reasons, I
do not seriously think that any of these cracked passwords would be used
for actual attacks, yet it's an angle you might want to consider.

Alexander

P.S. I don't intend to participate.  I am merely commenting on this.
And yes, please take the actual coordination, if any, off-list.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.