john-users - complicated uses of the rule preprocessor (was: new command AN"STR")

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20100227204541.GA22091@openwall.com>
Date: Sat, 27 Feb 2010 23:45:41 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: complicated uses of the rule preprocessor (was: new command AN"STR")

The following is an update to the message I posted a couple of months ago.
This update reflects the changes made in JtR 1.7.5.  I will quote some
context below, but not all of it.  Those who want more context can see
the previous lengthy message here:

http://www.openwall.com/lists/john-users/2009/12/28/4

On Mon, Dec 28, 2009 at 08:45:21PM +0300, Solar Designer wrote:
> On Mon, Dec 28, 2009 at 11:57:32AM +0100, websiteaccess@...il.com wrote:
> >  An other thing, still with An"STR" command,
> > 
> >  I would like generate testhouse, Testhouse, TestHouse, test_house, 
> > test.house, test1house etc... with only 1 code line. What is the 
> > fastest way to do that ?
> 
> With the current version of JtR, encoding all of this on one line yet
> avoiding duplicates is not pretty.  The preprocessor works great when we
> need to try all possible combinations - e.g., lowercase and append one
> of three characters, then also capitalize and append one of the same
> three characters - 6 combinations total.  It is not that convenient to
> use when only a subset of the combinations needs to be tried.  In fact,
> before version 1.7.4 that might have been impossible to do.  With 1.7.4,
> you can achieve this via the new "parallel ranges" feature, additionally
> relying on its undocumented and subject-to-change behavior when the
> ranges in question are of different size.

The subject-to-change behavior that I was referring to has actually
changed (on purpose) in 1.7.5.  Thus, some of the preprocessor
expressions found in that message will need to be revised in order for
them to work as intended on 1.7.5+.

> Specifically, your desired list of combinations includes "don't insert a
> character", "insert an underscore", "insert a dot", etc.  There's no way
> to represent the "don't insert" case in a list of characters - it has to
> affect the commands, not just command parameters - or we have to specify
> a magic character that would get deleted later.
> 
> For example, this line:
> 
> : [lc] Az"[ :_.1][hH]ouse" @:
> 
> achieves your desired effect for the word "test".

This one works correctly with 1.7.5 as well.  It does not rely on any
undocumented behavior.

> However, it has a number of drawbacks:
> 
> 1. It uses the colon as a magic character, which is then deleted.  If a
> wordlist happens to include some colons, they would get deleted as well.
> 
> 2. Searching for and potentially deleting a character that we've just
> inserted is suboptimal.
> 
> 3. It assumes that the results of "l" (lowercase) and "c" (capitalize)
> will differ.  If an input "word" does not start with a letter, then
> duplicates will be produced.
> 
> 4. It assumes a case-sensitive target hash type (e.g., raw MD5 or NTLM,
> but not LM), or it will effectively produce duplicates (75% of total).
> 
> 5. It assumes no length limit on passwords supported by the target hash
> type, or it might effectively produce duplicates.
> 
> 6. It requires that you turn your word "house" into "[hH]ouse" when you
> generate the ruleset.  It could be convenient to be able to surround the
> word written as-is with some rule commands and to have the ruleset line
> try the appended word in both forms.
> 
> To fully address #1 and mitigate #2, we can change the line to:
> 
> : [lc] Az"[ :_.1][hH]ouse" \p2[:D]\p2[:l]

For 1.7.5+, this should be written as:

: [lc] Az"[: _.1][hH]ouse" \p2[D:]\p2[l:]

which I think is more natural (the special case character is the very
first one listed).

> This uses the undocumented and subject-to-change behavior of "parallel
> ranges" that I referred to above.  Specifically, when the inserted
> character is other than the second one, the rule will end in "::" (two
> no-op commands, which, by the way, no longer have a performance cost).
> However, when the second character is inserted, then the rule ends in
> "Dl" (delete the character at the initial length).  Please note that
> this line no longer makes any assumptions about the character being
> inserted and deleted; it is only referred to by its position number.
> 
> I am likely to change the behavior in a future version of JtR such that
> the above would need to be written as:
> 
> : [lc] Az"[: _.1][hH]ouse" \p2[D:]\p2[l:]
> 
> which I think would be more natural.  Since this change of behavior is
> not in place yet,

This change is now in place.  It is in 1.7.5.

> I will continue to assume the current behavior in the rest of this message.
> 
> To fully address #3 and mitigate #4, we can do:
> 
> -[:c] \p1[lc] (?\p1[za] Az"[ :_.1][hH]ouse" \p4[:D]\p4[:l]

For 1.7.5+, this should be:

-[:c] \p1[lc] (?\p1[za] Az"[: _.1][hH]ouse" \p4[D:]\p4[l:]

> This will use "(?a" (reject unless starts with a letter) along with "c"
> (capitalize), but "(?z" (effectively a no-op, but a non-free one
> unfortunately) along with "l" (lowercase).
> 
> Additionally, it will use the "-c" rule reject flag (reject the rule
> unless the target hash type is case-sensitive) along with the "c"
> command (capitalize).  Thus, the number of effective duplicates for
> case-insensitive hashes is reduced from 75% to 50%.  The remaining
> duplicates will be because of "house" vs. "House", which is currently
> not reasonable to address within just one ruleset line.

Avoiding those remaining effective duplicates with case-insensitive
hashes has become reasonable with a new feature included into 1.7.5.
I'll describe it at the end of this message.

> We could do:
> 
> -[:c] \p1[lc] (?\p1[za] Az"[ :_.1]\p1[hH]ouse" \p4[:D]\p4[:l]

For 1.7.5+, this should be:

-[:c] \p1[lc] (?\p1[za] Az"[: _.1]\p1[hH]ouse" \p4[D:]\p4[l:]

> which avoids the remaining duplicates for case-insensitive hashes, but
> it also reduces the number of candidate passwords for case-sensitive
> hashes - not all lowercase/capitalize combinations for the two words are
> tried.  Maybe this change is desirable, maybe not.
> 
> To address #5, we can simply add a length check:
> 
> -[:c] <* \p1[lc] (?\p1[za] Az"[ :_.1][hH]ouse" \p4[:D]\p4[:l]

For 1.7.5+, this should be:

-[:c] <* \p1[lc] (?\p1[za] Az"[: _.1][hH]ouse" \p4[D:]\p4[l:]

> This rejects input words that leave no room for the addition of at least
> one character.
> 
> To address #6, we can use one of:
> 
> -[:c] <* \p1[lc] (?\p1[za] Az"house" T[zl] il[ :_.1] \p5[:D]\p5[:l]
> 
> or:
> 
> -[:c] <* \p1[lc] (?\p1[za] $[ :_.1] val1 Az"house" T[zl] \p4[:D]\p4[:a]
> 
> or:
> 
> -[:c] <* \p1[lc] (?\p1[za] $[ :_.1] val1 Az"house" [:T]\p5[:l] \p4[:D]\p4[:a]

For 1.7.5+, these should be:

-[:c] <* \p1[lc] (?\p1[za] Az"house" T[zl] il[: _.1] \p5[D:]\p5[l:]

or:

-[:c] <* \p1[lc] (?\p1[za] $[: _.1] val1 Az"house" T[zl] \p4[D:]\p4[a:]

or:

-[:c] <* \p1[lc] (?\p1[za] $[: _.1] val1 Az"house" [:T]\p5[:l] \p4[D:]\p4[a:]

> All of these include the added word as-is, making it easy to generate
> such a ruleset, but there's a performance impact.  Of these three, the
> last one is the fastest (at least for a specific build of JtR 1.7.4 on
> my computer), because the "$" and "v" commands are cheap compared to "i"
> and because the "T" command is only generated (by the preprocessor) when
> it is needed.

This has remained true for 1.7.5 as well.

> [...] lists of characters with repeats could be a way to help
> avoid duplicates with case-insensitive hashes in the above examples -
> we could write "-\r[:ccc]" followed by "\p1[lclc]" and "\p1[hHHh] to
> specify the four combinations explicitly such that we're able to exclude
> three of the four with case-insensitive hashes.

I've implemented this in 1.7.5.  Here is a lengthy line that makes use
of this "list of characters with repeats" feature along with everything
else explained above (to deal with all of the numbered drawbacks of the
simple line):

-\r[:ccc] <* \p1\r[lclc] (?\p1\r[zaza] $[: _.1] val1 Az"house" \p1\r[:TT:]\p1\r[:ll:] \p4[D:]\p4[a:]

This is optimal for all hash types at once, whether case-sensitive or
not, but indeed it is more complicated.  It gets expanded into 20 rules
of which only the first 5 are "accepted" with case-insensitive hashes.
That's precisely what was desired.

Specifically, with LM hashes loaded for cracking and only the above line
in [List.Rules:Wordlist], we get the following in the .log file:

20 preprocessed word mangling rules
Rule #1: '-: <* l (?z $: val1 Az"house" :: Da' accepted as '<*l(?z$:val1Az"house"Da'
Rule #2: '-: <* l (?z $  val1 Az"house" :: ::' accepted as '<*l(?z$ val1Az"house"'
Rule #3: '-: <* l (?z $_ val1 Az"house" :: ::' accepted as '<*l(?z$_val1Az"house"'
Rule #4: '-: <* l (?z $. val1 Az"house" :: ::' accepted as '<*l(?z$.val1Az"house"'
Rule #5: '-: <* l (?z $1 val1 Az"house" :: ::' accepted as '<*l(?z$1val1Az"house"'
Rule #6: '-c <* c (?a $: val1 Az"house" Tl Da' rejected
Rule #7: '-c <* c (?a $  val1 Az"house" Tl ::' rejected
Rule #8: '-c <* c (?a $_ val1 Az"house" Tl ::' rejected
Rule #9: '-c <* c (?a $. val1 Az"house" Tl ::' rejected
Rule #10: '-c <* c (?a $1 val1 Az"house" Tl ::' rejected
Rule #11: '-c <* l (?z $: val1 Az"house" Tl Da' rejected
Rule #12: '-c <* l (?z $  val1 Az"house" Tl ::' rejected
Rule #13: '-c <* l (?z $_ val1 Az"house" Tl ::' rejected
Rule #14: '-c <* l (?z $. val1 Az"house" Tl ::' rejected
Rule #15: '-c <* l (?z $1 val1 Az"house" Tl ::' rejected
Rule #16: '-c <* c (?a $: val1 Az"house" :: Da' rejected
Rule #17: '-c <* c (?a $  val1 Az"house" :: ::' rejected
Rule #18: '-c <* c (?a $_ val1 Az"house" :: ::' rejected
Rule #19: '-c <* c (?a $. val1 Az"house" :: ::' rejected
Rule #20: '-c <* c (?a $1 val1 Az"house" :: ::' rejected

Alexander

P.S. Maybe a future version of JtR will squeeze out the "(?z" commands
along with other forms of no-ops, but this is not implemented yet.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.