john-users - RE: Wordlist Mangling Rule

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <00f601cb8612$67443e10$35ccba30$@ihug.co.nz>
Date: Wed, 17 Nov 2010 17:46:30 +1300
From: "Al Grant" <acgrant@...g.co.nz>
To: <john-users@...ts.openwall.com>
Subject: RE: Wordlist Mangling Rule

Thanks for that Alexander....I had started to make sense of it all but your
explanation speed things up considerably.

I have decided that trying every combination of numbers after my words would
be too time consuming so I have revised it to try everyword of 8 characters
with no appending but toggle case of first char (T0).

Then any word under 8 characters also has T0 but the correct number of
sequential digits added:

Smith123
smith123

I am assuming this would be something like:

<9>7T0
<8>6[T0]$1
Etc etc

And yes I am intending to use it with aircrack. Thanks in advance,

Cheers

-Al




-----Original Message-----
From: Solar Designer [mailto:solar@...nwall.com] 
Sent: Wednesday, 17 November 2010 12:20 p.m.
To: john-users@...ts.openwall.com
Subject: Re: [john-users] Wordlist Mangling Rule

On Sat, Nov 13, 2010 at 10:23:50AM +1300, Al Grant wrote:
> I have tried from the FAQ rule page to decrypt how the rules you have 
> written work.

I'm not sure what page you refer to.  There's one documenting the rules
syntax, but it's not a FAQ:

http://www.openwall.com/john/doc/RULES.shtml

> Would you mind breaking it down? Ie [c:c] does what etc?

Let's start with a simpler line:

<B >7 [clu]

The square brackets trigger preprocessor expansion.  So this line gets
expanded into 3 separate rules:

<B >7 c
<B >7 l
<B >7 u

Each rule is individually applied to all words from your wordlist.

Let's look at the first one of these rules:

<B >7 c

It contains three rule commands.  Unlike separate rules (above), the rule
commands in the same rule (on the same line post-expansion) are applied one
after another - that is, the second command is applied to the result of the
first (not to the original word), the third one is applied to the result of
the second, etc.  Also, if one of the commands rejects the input word,
further commands are not used for that word; the entire rule (one line
above) produces no output for such a word.

The first command above is "<B".  The "<" character is the command code.
It is documented in doc/RULES as:

<N	reject the word unless it is less than N characters long

The "B" character corresponds to the "N" placeholder in the documentation -
that is, it is the position code.  These are also documented in doc/RULES:

"Numeric constants may be specified and variables referred to with the
following characters:

0...9	for 0...9
A...Z	for 10...35
[...]"

According to this, "B" specifies the number 11.

Thus, the command "<B" will reject its input word (and not let it be
processed with further commands on the same line) "unless it is less than 11
characters long".  In other words, it will insist that words be no longer
than 10 - that's one of the requirements you had mentioned for words that
we're not going to append digits to.

The next command is ">7".  (This one is only reached if "<B" did not reject
the word.)  Similarly, this one insists that words be no shorter than 8
characters (8 being the smallest number that is "greater than 7").

Finally, the last command in that rule is "c".  It is documented as:

c	capitalize

Thus, the entire "<B >7 c" rule will capitalize words that are 8 to 10
characters long, but it will reject others.  The next two rules:

<B >7 l
<B >7 u

are similar, except they will "convert to lowercase" and "convert to
uppercase", respectively.

That's all for the simple line discussed so far:

<B >7 [clu]

Now let's see what the next line does:

<8 >6 [clu] $[0-9]

This one gets expanded into as many as 30 rules:

<8 >6 c $0
<8 >6 c $1
[...]
<8 >6 c $9
<8 >6 l $0
[...]
<8 >6 l $9
<8 >6 u $0
[...]
<8 >6 u $8
<8 >6 u $9

(I've omitted many of them above.)

So that's 30 rules, each consisting of 4 commands.  The first 3 of the
commands were already discussed above (although the length limits are
different now).  The fourth one appends a digit:

$X	append character X to the word

(where a specific digit is substituted for the "X" placeholder mentioned in
the documentation).

The next ruleset lines may be:

<7 >5 [clu] Az"[0-9][0-9]"
<6 >4 [clu] Az"[0-9][0-9][0-9]"
<5 >3 [clu] Az"[0-9][0-9][0-9][0-9]"

The last one of these is expanded into as many as 30,000 rules:

<5 >3 c Az"0000"
<5 >3 c Az"0001"
[...]
<5 >3 u Az"9998"
<5 >3 u Az"9999"

Each of the above rules consists of 4 commands, the first 3 of which we've
already discussed.  The fourth is:

AN"STR"	insert string STR into the word at position N

The documentation also says:

"To append a string, specify "z" for the position."

which is also documented in its proper section:

z	"infinite" position or length (beyond end of word)

So we're inserting the "string STR" beyond the end of the word - or in other
words, we're indeed appending the string.  In each of the 30,000 rules
(produced for us by the preprocessor on the fly), only one specific string
to append is specified (e.g., only "0000" initially).

Now let's consider these more complicated ruleset lines:

-\r[c:c] <B >7 \p[clu]
-\r[c:c] <8 >6 \p[clu] $[0-9]
-\r[c:c] <7 >5 \p[clu] Az"[0-9][0-9]"
-\r[c:c] <6 >4 \p[clu] Az"[0-9][0-9][0-9]"
-\r[c:c] <5 >3 \p[clu] Az"[0-9][0-9][0-9][0-9]"

These differ from those we've discussed so far by the addition of "-\r[c:c]"
to the beginning and "\p" into the middle.  Let's see what these achieve.

First, "[c:c]", with its non-escaped use of square brackets, is indeed a
preprocessor expression, much like "[clu]" and "[0-9]", which we've
discussed above.

"\r" and "\p" are "magic escape sequences" to the preprocessor.  These are
documented closer to the end of doc/RULES:

"Finally, the preprocessor supports some magic escape sequences.  These
start with a backslash and use characters that you would not normally need
to escape.

[...]

"\p" before a range to have that range processed "in parallel" with
preceding ranges

[...]

"\r" to allow the range to produce repeated characters."

Thus, this line:

-\r[c:c] <B >7 \p[clu]

is expanded into three rules:

-c <B >7 c
-: <B >7 l
-c <B >7 u

We needed "\r" because we have two instances of the "c" character in "[c:c]"
and we wanted to preserve both (see below for the explanation).
We needed "\p" to have the two character lists - "[c:c]" and "[clu]" -
processed "in parallel".  In other words, we wanted only the three lines
above to be produced, not 9 lines for all combinations, which is what we
would get from the preprocessor by default (and which we relied upon when
appending digits, above).

Now, what does "-c" at the start of a rule do?  This is a "rule reject
flag", documented as:

-c	reject this rule unless current hash type is case-sensitive

Note that unlike "<B" and other "rule commands", which reject individual
input words, the "rule reject flags" reject entire rules.

Thus, if the current hash type is case-insensitive - which pretty much means
LM hashes in practice - the entire rule (which is "<B >7 c") will be
rejected.  Indeed, with a case-insensitive hash there's no point in
capitalizing words when we're going to try them as-is as well (by the next
rule).  If we did not reject the rule, then effectively duplicate candidate
passwords would be generated and hashed, thereby wasting time.

The next rule is:

-: <B >7 l

This one uses a rule reject flag too, but a dummy one:

-:	no-op: don't reject

The only reason why it does, and why this flag is even supported, is to
allow for our use of the preprocessor.  These flags have almost no
performance cost anyway - they're applied per-rule, not per-word.  As you
can see in the log, the rules being applied per-word have their rule reject
flags, if any, already removed from them.

Finally, we have:

-c <B >7 u

which is similar to the first one of these three rules - it is applied to
case-sensitive hashes only.

As to the rest of the original ruleset lines:

-\r[c:c] <8 >6 \p[clu] $[0-9]
-\r[c:c] <7 >5 \p[clu] Az"[0-9][0-9]"
-\r[c:c] <6 >4 \p[clu] Az"[0-9][0-9][0-9]"
-\r[c:c] <5 >3 \p[clu] Az"[0-9][0-9][0-9][0-9]"

these are expanded into larger numbers of rules.  The last one of these is
expanded into 30,000 rules like:

-c <5 >3 c Az"0000"
-c <5 >3 c Az"0001"
[...]
-: <5 >3 l Az"0000"
[...]
-c <5 >3 u Az"9999"

...and we've already discussed the meaning and the rationale of the
individual rule reject flags and rule commands in use by these rules.

Whew, looks like that's all.  This is simple stuff for me, but I see how it
can be complicated for others given that explaining it takes a while.

Does this help?

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.