john-users - Hashcat BF++ vs JtR Incremental and Markov Modes: (was How does incremental mode works?)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <028001cdd762$7aad3170$70079450$@edu>
Date: Tue, 11 Dec 2012 00:43:41 -0500
From: "Matt Weir" <cweir@...edu>
To: <john-users@...ts.openwall.com>
Subject: Hashcat BF++ vs JtR Incremental and Markov Modes: (was How does incremental mode works?)

So I've been playing around with Hashcat's Brutefoce++ mode and I figured I
should share my findings with everyone else.

 

Quick Disclaimer: 

All my observations about Hashcat's Bruteforce++ mode are based on using
Hashcat's statsprocessor (version 0.08). According to the documentation, it
uses the same algorithm as Hashcat's other cracking programs but I could be
wrong. Speaking about documentation, there isn't a whole lot about what BF++
actually does under the hood. I also don't have access to the source-code
nor have I reversed it. So please take everything I say with a grain of
salt. On a related note, Hashcat's wiki states:

 

"There is no known alternative that can handle markov like this, but the
following programs can at least generate wordlists and are partially
configurable: John the Ripper (using --stdout)
http://www.openwall.com/john/"

 

That seems a little bit unfair. It kind of shows the lack of communication
between the different password cracking communities. This isn't just a
Hashcat thing, as I'm willing to bet the JtR community's general knowledge
of Hashcat is limited as well.

 

Hashcat Bruteforce++, (plus some JtR), background:

So just like JtR's Incremental and Markov modes, Hashcat's Bruteforce++ mode
uses the conditional probability of different letters appearing together to
determine how it generates its guesses, (aka Markov chain modeling). The
biggest different between Hashcat's Bruteforce++ mode and JtR's modes is the
algorithm it uses to go through the search space. As can be seen in the
differences between JtR's Incremental and Markov modes, the actual algorithm
used, (aka how it takes advantage of the conditional probability info),
makes a huge difference in how it performs in real cracking attacks.

 

JtR's Incremental mode attempts to generate guesses in probability order
while going through the entire search space, (aka it will eventually try
every combo possible no matter how unlikely it is). JtR's Markov mode on the
other hand makes no attempt to generate guesses in true probability order
but it will only generate guesses when their probability falls above a
limit. Aka it won't try 'vsFD82,sq' but it will go through all the most
likely guesses in the time you allocate to it. While this sounds like a
negative, the reality is you'll likely never have enough time to bruteforce
all 12 character passwords, so ignoring low probability guesses isn't as
harmful as you'd expect. 

 

As far as how they compare, given the same number of guesses Markov mode
will almost always crack more passwords than Incremental mode does. I'm
still looking into why that's the case, but if you know exactly how long
your cracking session will be Markov mode is the way to go. That being said,
since Markov mode requires you to specify a limit, if you don't know how
long your password cracking session will be (or you want to crack a password
as quickly as possible and then just stop) Incremental mode is a better
option.

 

So that leaves us with Bruteforce++. Like JtR's Markov mode you can specify
a limit so that it will only search a limited portion of the keyspace. If
you don't specify any limit it will search the whole keyspace but it
attempts to do so in an optimized fashion. The actual algorithm/looping it
uses is a bit quirky but in a nutshell:

1) Bruteforce++ will exhaust the keyspace for each length guess before
moving on to the next size guesses. Aka it will guess all 3 character
passwords before trying 4 character passwords. This is a big difference from
JtR's incremental mode where it will switch between longer and shorter
passwords depending on their current probability.

2) The order that Bruteforce++ ranks characters to use is dependent on where
that character appears. Aka uppercase characters will be more likely at the
front of a guess and numbers more common at the end. This is like JtR's
Incremental mode, and different from JtR's Markov mode.

3) The first character's order in a guess in Bruteforce++ is based on simple
letter frequency, (since there is no character before it). What's strange
though is that it also appears the second character's order is based on
simple letter frequency as well. I've run a bunch of tests and all of them
seem to show that. It isn't until the third character that it uses 1st order
Markov chains, (aka it calculates the probability of a letter based on the
letter before it).

4) This is the part that'll get me in trouble, but it "seems" that
Bruteforce++ uses the ordering of characters appearing vs. the probability
of them appearing. Aka instead of saying that given 'q', 'u' follows it 98%
of the time, it'll say 'u' is the most likely character to follow 'q'. While
this sounds like an esoteric point, it's actually a pretty big distinction.
As an example of that, the probability info might tell me to only try 'u'
after 'q', while if I'm just using ordering info I have to blindly guess the
average case for how many transitions I want to allow. Aka there might be a
lot of characters that are beneficial to try after 'a', but I have to treat
'a' and 'q' the same way. Both JtR's Incremental and Markov modes use
probability info.

5) Bruteforce++ allows the user to set a threshold for how many potential
transitions they want to allow so unlikely guesses are skipped. This is
slightly different than the limit JtR's Markov model uses which calculates
the total probability of a guess instead. For example, if you set the
threshold in Bruteforce++ to be 10 then only the ten most likely starting
characters will be used. This applies for every following transition as
well. The plus side is it becomes very easy to calculate the total keyspace
BF++ will search. Aka if you have a threshold of 10, and are only cracking 8
character passwords, the total keyspace searched would be 10^8, or
100,000,000.

6) I'm not going to go too much into it in this post, but Bruteforce++ also
allows you to specify a "Mask" to use as well. Aka you can say "only try
digits at the end" or "only try capital letters at the front". This is very
useful when targeting password creation requirements or you have some
knowledge about the target passwords that was not contained in the training
set.

 

Bruteforce++ vs. Incremental vs. Markov Results:

So now that all the background is out of the way, let's see how they
actually compare when cracking real passwords. The first step I took was to
make sure all three methods were trained on the same training set. As I've
done previously, I randomized and then divided the RockYou list into 32
different 1 million password sets. This way I can train and test against
very similar passwords without training and testing against the exact same
passwords. Please note that I kept duplicate passwords in for both training
and testing. That's been a point of contention in the past, but I strongly
believe that keeping duplicates in is important when figuring out how a
password cracking session will work in real life. I then trained all three
methods using the RockYou_32 list. 

 

For the first test I ran a short cracking session allowing each tool to make
269,538,503 guesses against the RockYou_1 list. Why did I choose such a
random number? Well that's how many guesses JtR's Markov mode would make
with a limit of 200. For the record, I did not limit the charset for any of
the three modes, (aka no Masks), nor did I set a threshold for Bruteforce++
in this test. The results can be found at the link below:

 

https://sites.google.com/site/reusablesec/Home/pictures/hc_bruteforce_graph1
.png

 

As you can see, both of JtR's modes were clear winners here. Now while I
didn't include it in the graph, I also ran a pure brutefoce attack, (which
cracked 2,349 passwords), so a no-threshold Hashcat's BF++ mode did do
substantially better than that (11,419 cracked). Still that wasn't anywhere
near the 446,991 that JtR's Markov mode cracked.  Upon reflection though,
this makes sense since Bruteforce++ essentially got 'stuck' trying to
perform a full bruteforce of five character passwords before it starting
trying to crack 6 character passwords, (which it never got to).

 

For the next test then I limited all three methods to only try 8 character
passwords. I also upped the number of guesses to 425 million total, (Markov
limit of 215), and tested against the RockYou_2 list. This way we can get a
better idea of how Bruteforce++ goes about searching the keyspace. The
results can be found at the link below:

 

https://sites.google.com/site/reusablesec/Home/pictures/hc_bruteforce_graph2
.png

 

Once again, both of JtR's bruteforce modes performed significantly better.
Still, it wasn't a completely fair comparison since I haven't made use of
the threshold option in Hashcat's BF++ mode to keep it from making unlikely
guesses. So I then re-ran the previous test, (requiring 8 character long
passwords), using a threshold of 12 which was optimal for making around 425
million guesses, (12^8 works out to around 430 million guesses). The results
can be found at the link below:

 

https://sites.google.com/site/reusablesec/Home/pictures/hc_bruteforce_graph3
.png

 

While changing the threshold really helped the effectiveness of the Hashcat
BF++ run, it still wasn't enough for it to come close to cracking as many
passwords as either of JtR's modes. So in conclusion, both of JtR's
Incremental and Markov modes appear to perform significantly better than
Hashcat's Bruteforce++ mode even when the threshold is optimally set for
Hashcat's Bruteforce++ mode. As a caveat to that, I haven't taken into
account two potential variables that might make Brutefoce++ a better
algorithm in certain situations. First of all, there's the question of how
all three algorithms perform when used in GPU cracking. I unfortunately
don't have any side-by-side comparison about how those algorithms perform
when implemented in GPUs and I'll have to leave that testing to someone
else. Second there's the Mask feature built into Hashcat's Bruteforce++
mode. This functionality can also be incorporated into JtR's modes using an
'-external' filter but it's a bit of a pain. Still the advantage of using
Masks in conjunction with Markov enhanced brutefoce attacks might be worth
looking into some more.

Bonus Test:

I had a moment of doubt where I wondered if a threshold for Hashcat's
Bruteforce++ mode of 12 was actually the best possible threshold when
generating 425 million guesses, (long story but it has to do with the fact
that it uses ordering vs probabilities). Luckily this was something easy to
test. All I needed to do was re-run the previous cracking runs with
different threshold settings! As a quick spoiler, yes a threshold of 12 was
the best setting to use. Another useful result from this test is it'll
hopefully give you an idea how much 'flexibility' you have when selecting a
threshold that isn't optimal for your cracking session. The results can be
found at the link below:

 

https://sites.google.com/site/reusablesec/Home/pictures/hc_bruteforce_graph4
.png

 

As always, if I've made any mistakes or incorrect assumptions please correct
me.

 

Matt
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.