john-users - CrackHeads team writeup (KoreLogic DEFCON contest)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20100803015625.0ae72c10@gamma>
Date: Tue, 3 Aug 2010 01:56:25 +0000
From: Brandon Enright <bmenrigh@...d.edu>
To: john-users@...ts.openwall.com
Cc: bmenrigh@...d.edu, t2maddock@...d.edu, Ron <ron@...llsecurity.net>,
        Minga Minga <mingakore@...il.com>
Subject: CrackHeads team writeup (KoreLogic DEFCON contest)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This is a long email.  I have broken it into sections:

* [THANKS]
* [PRE-CONTEST]
* [RESOURCES]
* [STRATEGY]
* [ADAPTATIONS]
* [MISTAKES]
* [SUGGESTIONS]
* [END]


[THANKS]

First, thanks to Minga, Hank, and KoreLogic for taking the time to put
on the contest!  I know hosting a contest is a ton of work and often
a quite thankless job. Also, it's one thing to host a contest and it's
another to host and fund cash prizes without even charging a
registration fee.  I really appreciate it.

Also, thanks to Solar Designer and all the John contributers.  John is
great software and using it makes cracking fun.  Without John I doubt a
contest like this would even have been possible.


[PRE-CONTEST]

I have put a lot of engineering time into cracking various hashes both
for my job here at UC San Diego as well as for fun cracking things like
PHPBB and Carders.CC.

Ron Bowes found out about the contest and suggested we put together a
team.  I realized that my process of cracking is somewhat chaotic so I
spent many hours of free time in July organizing, cleaning up word
lists, researching Amazon EC2, etc.

I decided that I would do most of my cracking out of Amazon EC2 and my
friend and co-worker Tom Maddock would crack with his ATI Video card
using oclHashcat and he would manage our 2000 core-hours of cluster
time.

I built out a pretty nice Gentoo-based Amazon Machine Image (AMI) and I
grabbed john code, patches, etc and rolled them into the image.  I
also rolled in some perl scripts, Matt Weir's probabilistic cracker,
and various other supporting code.

I had the foresight to hack up a perl script and stuff it in cron to
automatically monitor the john.pot file and email a team mailing list
the new cracks every 5 minutes.  By building this into the AMI
before-hand much of the maintenance work of managing cracks was
automated.  This saved so much time.  During the contest I wrote a
procmail script to automatically collect the new cracks from the emails
and prep them for manual submission.

Since I didn't know what sort of passwords the contest would use I also
spent quite a bit of time building and cleaning wordlists and
organizing them.  This turned out to be nearly useless.


[RESOURCES]

On my end:

I had two Dell servers each with 8 cores running at 2.8 GHz.  I ended
up not using these for any CPU time.  Instead I just used them to
manage wordlists, write code, submit cracks, etc.

I also had my Gentoo AMI built out for cracking and the ability to
spawn up to 20 instances on EC2.  I peaked at 17 during the contest.

Over the 48 hours I averaged roughly 12 instances running, each was
averaging about 80% load.  That's roughly 12 * 8 * .8 = 77 core-hours
of processing.  This cost me $185.


On Tom's end:

Tom owns an ATI 5850 video card that he used oclHashcat on.  We weren't
sure how best to use the card and we ended up letting it sit idle for
about half the time.  It is hard to gauge how useful the video card
was.  Although it only cracked a couple hundred hashes that we didn't
get with John, we used all of our cracked passwords to spot patterns
and to apply more rules to.

Tom also was in charge of our cluster time.  See:

http://tritonresource.sdsc.edu/cluster.php

We only had 2000 core-hours, of which we used 1700.  That roughly
averages to 35.5 cores running through the 48 hours of the contest.
The cluster is running Rocks 5.3.  We only focused on salted hashes
with the cluster.  This ended up being mostly wasted resources.

Magnum's MPI patch had several problems that prevented it from
working.  Tom spent about 12 hours working on getting it going.  Most
of the problems had to do with file locking and storing passwords in
the john.pot file.  There was also an issue with all cores starting at
the same place and cracking the same things rather than distributing
the work for the job.  I don't have any more details and I don't know
how Tom resolved these.


[STRATEGY]

Initially my strategy was going to be to run my highest quality English
wordlist with john --rules and then based on that, adapt.  I was also
planning on running Matt's Probabilistic Cracker trained on a cleaned
up set of RockYou + PHPBB.

I fired up 7 EC2 spot instances of the high-CPU (c1.xlarge) machines.
I dedicated each machine to a hash type: bcrypt, crypt, Oracle, SSHA,
FreeBSD-MD5, SHA, Windows (NTLM + LM).  Spot instances machines are
priced on a fluctuating marked based on demand and I was paying roughly
30 cents per machine per hour.  

See:
http://aws.amazon.com/ec2/#instance

I was using the "High-CPU Extra Large Instance".

The English wordlist sucked against the passwords in the contest and
Matt's cracker (with the training I did) was even worse.  It became
clear that I wasn't going to crack any bcrypt hashes and I wasn't sure
I was doing anything useful on the Oracle hashes.

In the first 12 hours I made almost no progress with any of the
non-Windows hashes.  I just got each instance cracking what I thought
was the most efficient use of the CPUs and left them there doing their
thing.

I focused much of my early engineering time on cracking the NTLM
hashes.  Once I had exhausted my quality worldlists and rules, I
started feeding already cracked passwords back through john rules which
did a decent job of cracking more.

I got 4 hours of sleep Friday morning.  When I got up I had to do 5
hours of classwork for a paper due.


[ADAPTATIONS]

It was obvious by Friday afternoon that the our strategy was not
working well.  I wanted to apply patterns to our already cracked but
I'm not very good with John rules.

I started out by creating a list of common source words:
http://noh.ucsd.edu/~bmenrigh/common.txt
And a very targeted list of only the really common ones:
http://noh.ucsd.edu/~bmenrigh/common_short.txt

I also spotted a lot of first and last names in the crack results so I
used Ron's Facebook names data to crack names in order.  This worked
okay.  It got more than my English wordlist did.

I trained Matt's guesses on the already cracked passwords and fed the
common lists into it.  This made great progress on the NTLM hashes but
because of the slowness of the salted/iterated hashes it only got a
few dozen.  I retrained on updated cracking progress many times and
continued to get good results.

Because Matt's program eats up a ton of memory the more guesses it
makes (like with NTML) I decided to start 2 high-CPU, high memory boxes
on EC2.  These cost $2.68 per instance, per hour.  I paid for 5
instance-hours total.  These instances are not worth the money unless
you have a need to use the memory.  I ended up not being able to use
the memory.  I did benefit from the slightly faster CPUs though.

We decided that the cluster would allow us to make big progress on the
slow hashes so I created a 4 GB file with Matt's probabilistic cracker
and we ran that on through crypt, ssha, and FreeBSD-MD5.  This is how
we made most of our progress on the non-Windows hashes.

I decided it would be faster for me to write some C code to mangle
words based on desired patterns rather than learn and test John rules.
I'm not sure this is actually true or not but I went with C anyways.

This word mangling program was by far our biggest breakthrough of the
whole contest.  I think the biggest issue with stand-alone mangling
programs that when you stuff in a chain of pipes is that they rob John
of CPU time.  To reduce that impact, I hacked something together that
doesn't use malloc(), doesn't use printf() and buffers the output to
reduces calls to write() by printing a large number of guesses all at
once.

The program takes input and can do things like prepend, insert, or
append numbers, symbols, or any character.  It can also leet-speak,
tOgGle cAsE, transpose eltters, replace charakterz, etc.

I know John rules can do all these things and more so this isn't
exactly a huge breakthrough.  I'm not sure how easy it is to "stack"
John rules while minimizing (but not eliminating) duplicates but my
program handles that naturally by the nature of recursion.

It was hacked together during the contest and surely has lots of bugs
but you can grab the code here:

http://noh.ucsd.edu/~bmenrigh/mengele.tar.gz

I used my common_short.txt file to great success with my mengele
program.


[MISTAKES]

Because all the hash formats were worth the same amount, we basically
wasted all of our cluster time cracking a few thousand non-Windows
hashes.  I also only used about 1/3rd of my Amazon EC2 time focused on
Windows hashes.  Cracking NTLM hashes seemed to be the key to winning
the contest.  The majority of our points came from cracking NTLM and I
know we could have cracked a whole bunch more if I hadn't wasted so
much engineering time cracking the salted passwords.  We could have
done a lot with the cluster against NTLM too.

We didn't know how to make use of oclHashcat.  In retrospect we should
have written a script to generate patterns and then feed them into
Hashcat.  Letting it sit idle for so many hours was a huge mistake.

I was planning on dedicating 1 EC2 instance to each set of admin
hashes.  I made the mistake of glancing through the released hashes and
concluding that there was no differentiator between admin and user
hashes.  It wasn't until a few hours before the end of the contest that
I realised it was easy to tell them apart.  I would have started up 14
spot instances instead of 7 and cracked just admin hashes on one
instance.  This would have helped negate the salting and overall
slowness by focusing work on just high value hashes.

I should have realized that bcrypt was a lost cause and used the
machine for other tasks.

I had a machine trying to crack the Oracle hashes.  Towards the end I
had heard from enough people that they were, indeed, Oracle hashes that
I decided to throw more resources at them.  I guessed that each of the
1000 were admin hashes.  I started up 4 more high-CPU instances and
split the 1000 hashes into 4 chunks of 256.  I split the input list to
me mangled into chunks of 8.  I started 8 jobs against each group of
hashes for a total of 32 cores going all out.  I left these going for 6
hours and I still did not crack a single one.  Even using the most
efficient mangling configuration for mengele, I still didn't crack a
single one.  These resources would have been much better focused on
NTLM on SSHA.


[SUGGESTIONS]

The contest was very fun but there are some aspects of it that were a
bit frustrating or even a bit unrealistic.


* For scoring, I think the admin hashes were a good idea.  Having all
the different hash formats worth the same amount though was my biggest
complaint of the contest.

Here is my idea for a point system:

Have 3 different types of plain text -- easy, medium, and hard.  Easy
could be worth 1 point, medium 2, and hard 5.

Have a multiplier for the hash type.  Make LM .5, NTLM 1, SHA 2, SSHA
and CRYPT 10, FreeBSD MD5 25, Bcrypt 100.

Make an admin hash a multiplier of 10.

That is, if you crack a medium NTLM hash it is worth 2 * 1 = 2 points.
If you crack a hard NTLM, admin hash it is worth 5 * 1 * 10 = 50
points.  If you crack a medium admin bcrypt hash it is worth 2 * 100 *
10 = 2000 points.

Also, give teams that are leading in a hash type or plaintext type a
point boost.  That is, the team that has the most SSHA cracks gets an
extra 1000 points.  The team that has the most admin hashes cracked
gets 1000 points.  The team that has the most easy hashes cracked gets
1000 points.  These bonus points would move from team to team depending
on who is leading in each category.


* For the leaderboard, teams should be able to write a sentence about
their tools.  I felt very guilty that we didn't have "John" in our name
and we weren't on the john-users team.


* For the passwords, at first I thought the plaintexts were very
unrealistic.  Later I started to realize the goal wasn't to see who can
crack realistic hashes but to see who can adapt their cracking
technique to meet the particular circumstance.  No password in the
contest was unrealistic by itself.  When thousands of leet-speaked
versions of "Vegas" are in one set of hashes though, it is them taken as
a whole that becomes unrealistic.  This isn't a problem with the
passwords, it is a problem with the characterization of the contest.
It wasn't who has the biggest wordlist but who can identify patterns
and apply them.  I really enjoyed figuring out the types of mangling.

Our organization has complexity requirements.  Usually the people that
were going to pick the password "password" don't satisfy the complexity
requirement by choosing "#P4ssW!or2010D", they choose "Password1!".
Only a few people in the organization will go all out. This contest
seemed to be about going after the people who are TRYING to make secure
passwords.


* Without a tiered plain text model like easy, medium, and hard, the
really slow hashes need to be limited to easy.  From cracking all the
NTLM hashes I knew the most likely pattern the Oracle, Bcrypt, and Free
BSD passwords would take.  Even knowing that I still couldn't crack any
Bcrypt or Oracle hashes.  I think the super-hard hashes need to have
weaker passwords.


* There should be a special password in each hash.  Something like "Yes,
that's it!".  That way we can test to make sure we even know the hash
format.  I thought something was broken with the Bcrypt and Oracle
hashes.


* I was very bummed out that some of the FreeBSD MD5 hashes I cracked
were duplicated in NTLM.  I know duplicate hashes across systems is
realistic but for the purposes of a contest, I don't think there should
be any duplicates.


* I have the impression that the Hashcat and Inside Pro teams did well
  by
cracking NTLM hashes exclusively.  I won't go as far as to say that GPU
cracking isn't fair -- they did score better than us -- just that
without a more detailed point system I was a bit frustrated that one
technique worked so well.  (If I'm wrong and more than just NTLM/LM
hashes were broken I'm sorry, ignore this comment.)


* PLEASE run the contest next year.  I had a great time this year.  I'm
sure everybody has ideas for how to do a lot better ;-)


[END]

Thanks again for taking the time to organize and fund this contest, I
had a really good time!

Brandon

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)

iEYEARECAAYFAkxXd08ACgkQqaGPzAsl94J0KgCfdeiX+jMw83zIQfWJHYdgMJ8D
kSwAoLQg9myHCPsx12/5MdSxlt/0sIm/
=YBr0
-----END PGP SIGNATURE-----
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.