john-users - team john-users writeup for DEFCON 2011 "Crack Me If You Can" contest

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20110818131035.GA1182@openwall.com>
Date: Thu, 18 Aug 2011 17:10:35 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: team john-users writeup for DEFCON 2011 "Crack Me If You Can" contest

Hi,

As many of you are aware, we participated in KoreLogic's "Crack Me If
You Can" password cracking contest at DEFCON earlier this month, as team
john-users.  We ended up taking 3rd place overall (out of 22), and we're
first for 5 out of 20 hash types.  Additionally, we temporarily held 1st
place during the contest at two times - at approx. 5 hours and 21-23
hours into the contest.  Here are the statistics for all teams:

http://contest.korelogic.com/stats.html

including pretty graphs of teams' progress over time, and here are the
per-hash crack numbers for our team in particular:

http://contest.korelogic.com/stats_7D47E99A316E29D7.html

Now to the writeup, to be re-published on the contest website:


	Preface.

The contest was fun and challenging, it helped us test some experimental
John the Ripper code and identify areas for further improvement.  As of
this writing (August 18, 2011), we already have experimental patches
implementing MSCash2 in CUDA (thanks, ukasz) and implementing pkzip
encryption cracking (thanks, JimF).  We didn't have those prior to and
during the contest...

We'd like to thank KoreLogic for organizing the event.  We would also
like to thank all other teams who participated and made it tough for us
to compete. ;-)


	Resources.

Active members: 16

Names / nicks:
Aleksey Cherepanov, bartavelle, Brad Tilley (team 16Crack), elijah,
Frank Dittrich, groszek, guth, Isif, JimF, Matt Weir, RichRumble,
samu, Sergey, smooge, Solar Designer, ukasz

Additionally, Brandon Enright contributed four 8-core Amazon EC2
instances (32 cores total) and Michael Boman provided remote access to a
quad-core machine.

Software: John the Ripper (with various patches), custom scripts,
16Crack (used by Brad only), pdfcrack (no luck), fcrackzip (no extra
cracks compared to trivial shell scripts around unzip), rarcrack and
crark (no luck, but JtR cracked the password instead), ElcomSoft's
password recovery tools (no additional cracks)

Hardware: mostly 8-core servers (some of them also doing something else
at the same time), but also all other kinds of machines (desktops,
laptops, servers) ranging from dual-core to 12-core, Amazon EC2
instances mentioned above.  3 low-end to mid-range NVidia GPUs (used
only on phpass hashes using john-1.7.8-allcuda-0.2 by ukasz), one ATI
Radeon HD 5770 (used for real-world'ish testing of
john-1.7.8-jumbo-5-opencl-1 rather than to make much progress in the
contest).  The number of CPU cores in use was growing slowly from 0 to
approx. 300 by the end of contest (we did not prepare well, so some
machines were put to use as late as 3 hours before contest end, and
additionally some of the servers were inappropriate to use without
someone watching after them), with the average estimated at around 150.


	Preparations.

Two days before contest start, we restored our file exchange server
(actually an OpenVZ container) from a backup dump from last year, and
started creating accounts for some new team members.  (The scripts used
to process and submit cracked passwords had to be revised slightly for
the new contest, but this was not known in detail before contest start,
so this step was taken during the first hours of the contest.)

With John the Ripper being our primary tool (almost the only password
cracking tool we used, in fact), and with us having access to many more
CPUs than GPUs, we needed a way to manage the many CPU cores
efficiently.  Thus, a customized contest-only edition of John the Ripper
was made and some scripts were written (but only made usable for the 2nd
day of the contest, unfortunately), which made it slightly easier for us
to manage multiple multi-core machines.  Other changes in the contest
edition of John the Ripper included revised incremental mode and
sse-intrinsics.S pre-compiled from .c using Intel's compiler (for
optimal performance at MD5-based hashes).

We also generated new .chr files from RockYou passwords, and uploaded
some wordlists and some rulesets to our file server, including
KoreLogic's ruleset from the 2010 contest revised to make more extensive
use of the rule preprocessor in JtR and re-ordered for decreasing rule
efficiency.

We definitely could have prepared a lot better.


	Approach, observations, mistakes.

Based on last year's experience and on password cracking experience in
general, we expected to derive all sorts of patterns from cracked
passwords and apply those to crack even more passwords.  This is also
what other well-performing teams did in these two contests.

The password-protected .zip's were cracked with shell one-liners running
"unzip -P" and reading passwords from a wordlist.  Luckily, this worked.
(The .zip support implemented in JtR -jumbo was limited to WinZip/AES,
not supporting the older pkzip encryption.)  Brad Tilley (team 16Crack)
was the first to crack the "defcon" password for our team.

The .rar was cracked with JtR, running password.lst with --rules for
several hours on an 8-core machine.  RichRumble did this.

To derive patterns, "fast" hashes were attacked first - NT and raw MD5.
In fact, due to us having more machines than people, two 8-core machines
were running JtR in incremental mode (for lengths up to 11) against
these hashes almost until the end of contest, even though this was not
the best use of resources (by far), as far as points are concerned.

The --external=DateTime mode was used on all saltless hashes when this
pattern was noticed.  Then more focused attacks were run with custom
scripts against salted hashes (on just the date formats actually seen).

Similarly, the "Mississippi" and "obsessiveness" patterns were noticed
and tested against various hash types (wasting time when tested against
the slowest hashes, as it turned out).

Not all of our machines were fully online, and not all people were
available at all times.  This resulted in us having to give out large
yet non-critical jobs to team members who expected to be offline for a
while.  For example, this might be why we performed so well at DES (even
though we did not crack the DES hashes found in coredumps being unsure
what they were), which was otherwise not an optimal use of resources
considering the low points earned per DES-based crypt hash (although the
100k bonus compensated that somewhat).

The mscash2 and bf hashes were successfully attacked almost exclusively
with incremental mode.  Late in the contest (too late), we also started
locking it to specific letter-digit patterns that we saw in passwords
cracked by that point.  Unfortunately, we wasted lots of resources
testing other patterns against these hashes - patterns seen in passwords
for other hash types, but somehow not for these.  It was weird
(unrealistic) to find plenty of short passwords (4 to 6 characters
long), yet not find any from RockYou's top 1000, nor username-derived.
So we kept probing for other patterns, wordlist entries, etc. but found
none, besides the trivial ones:

$ fgrep '$DCC2$' john.pot | cut -f2- -d: | sed 's/[a-z]/l/g; s/[0-9]/d/g' | sort | uniq -c | sort -rn
    148 llllll
     64 lllllll
     61 dd-dd-dd
     55 dddddd
     20 llllld
     12 lllll
     12 lllldd
      4 llllldd
      4 llll
      3 lllllld
      2 lllddd
$ fgrep '$2a$' john.pot | cut -f2- -d: | sed 's/[a-z]/l/g; s/[0-9]/d/g' | sort | uniq -c | sort -rn
    158 llllll
     54 dddddd
     51 lllllll
     44 dd-dd-dd
     17 lllldd
     14 llllld
     14 lllll
      3 llll
      2 lllddd

As seen on phpass and bsdi hashes that we cracked, we presumably could
also find passwords built upon "pennteller" and "hate", but perhaps not
much else.  (KoreLogic has not yet released the plaintexts as of this
writing, and we did not spend further resources cracking the hashes
after contest end, hence the uncertainty.)

Although we did notice cracked passwords for these hashes starting with
one of just a handful of letters (except for those starting with a
digit, indeed), we did not use this knowledge in any way, thinking that
it was an artifact of our use of incremental mode (which tries more
likely characters before less likely ones).  Thus, we did not manually
restrict the search to just these starting letters, which was probably
a mistake.  We did generate new .chr files based on already cracked
passwords, which would have achieved a similar effect, especially with
our revised incremental mode, but we did so based on all cracked
passwords (excluding only those that came from challenges), for all hash
types, naively expecting patterns from other hash types to show up on
the extra-slow hashes as well.  And, of course, cracked passwords for
all hash types combined started with all other letters as well.

At the same time, we cracked many far more complicated passwords for
other hash types, and even phrases of up to six words (mostly idioms
found in wordlists as-is, though).  Some very short passphrases were
even found with the revised incremental mode (up to 3 words, length 11).
We also used trivial Perl scripts to combine words from tiny wordlists
into 2-, 3-, and 4-word "phrases".

Note: this does not mean that passphrases are weak or a bad idea in
general; it merely means that some of them contain well-known or
predictable combinations of words, or too few too common words.
It also means that some hash types should not be used for password
hashing.  With the resources we had, in the 48 hours of contest we
would not be able to crack 3-word combinations generated by pwqgen
with default settings and hashed with bcrypt (known as bf in this
contest): http://www.openwall.com/passwdqc/


	What we liked and didn't like.

Overall, the contest was great, thanks to KoreLogic and all teams.

We liked:

- The scoring system.  While last year's contest demonstrated that with
equal value of each cracked password, slow and salted hashes are not
worth attacking very hard, if at all, this year's has demonstrated that
they can nevertheless be attacked if the passwords are sufficiently
valuable.  (However, contrary to what outside observers might think, it
has not demonstrated that those stronger hashes are almost as vulnerable
as the weaker ones, despite of the numbers of passwords cracked being
comparable.  This is the case only due to extremely weak passwords that
a properly configured system should not allow to be set, or at least
should warn the user about.)

- The presence of passphrases.  We missed those last year.

- Additional challenges in the contest, yet not terribly important to
the teams' overall scoring (otherwise this would not be a password hash
cracking contest anymore).

A concern, though, was that some of the challenges could require use of
non-free and closed-source tools.

Some things we found slightly disappointing were:

- Weird weights for some of the hashes: no distinction between saltless
and salted (semi-)fast hashes, mscash2 being valued too high (whereas it is
actually a lot easier to attack than bf, considering its GPU-friendliness,
albeit not by our CPU-focused team).

For example, the weights could be:

bf - 100000
mscash2 - 50000
phpass-md5 - 12000
md5-crypt - 10000
md5_gen(28) - 10000
bsdi - 5000
des - 700
md5_gen(12) - 700
md5_gen(16) - 700
mssql - 700
oracle11 - 700
phps - 700
ssha - 700
md5_gen(22) - 12
md5_gen(23) - 12
mysql-sha1 - 12
raw-sha512 - 12
raw-sha1 - 11
md5_gen(0) - 10
nt - 10

considering the speed of hash computation, number of different salts in
contest hashes (for each hash type), and some special properties of
these hashes (such as the length limit with des).  The 60x to 70x gap
between saltless and salted hashes proposed here is roughly sqrt(number
of salts), which is consistent with the use of logarithmic scale by hash
computation speed.

- Passwords still not being very realistic (even though KoreLogic might
not agree).  Username-based passwords not seen on slow hashes.

- No non-ASCII passwords, or maybe we failed to find them (despite of
having wasted a little bit of time on trying to do so).  OK, at least
this is almost realistic - those passwords are in fact very rare.  So we
can't really expect to have both a non-negligible number of non-ASCII
passwords, but realistic passwords overall.

A neutral comment:

- The bf and bsdi hashes could actually be even slower, to match
real-world systems where these hashes are used.  For example, bf is
nowadays often used at $2a$08, not $2a$05, which it was in the contest
(and which JtR uses for benchmarking for historical reasons).  This
would be 8 times slower.  The default of 725 iterations for bsdi is in
fact seen on some real-world systems, although reasonable settings are
much higher.  When phpass falls back to CRYPT_EXT_DES (the PHP name for
this hash type), it uses these hashes at 65535 iterations (90 times
slower than in the contest) when called with 8 for the
$iteration_count_log2 parameter to the PasswordHash constructor, like
its test program does and like some web apps that have integrated phpass
do.  Such changes could make the contest more realistic and would not
make these hashes appear weaker than they actually are (in real-world
uses).  However, they could make it too hard to attack the hashes
reasonably in just 48 hours, so this is not obviously a good change to
make in the contest.  If the change is made, then of course the weights
would need to be adjusted accordingly (using a logarithmic scale).  An
alternative is to document the "cost" settings of variable-cost hashes
used in the contest in some prominent place such that people do not draw
erroneous conclusions about the hashes from the contest results.

Thanks for reading this far (or did you just scroll down?)

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.