john-users - Re: new command AN"STR".

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091228174521.GA10589@openwall.com>
Date: Mon, 28 Dec 2009 20:45:21 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: new command AN"STR".

On Mon, Dec 28, 2009 at 11:57:32AM +0100, websiteaccess@...il.com wrote:
>  I try the new command AN"STR"

Great.  Thank you for bringing this topic up.

>  my dict has 1 word "test".
> 
>  I did "A3,house," I get  teshouset
> 
>  I did "A4,house,"  -> testhouse
>  I did "A5,house,"  -> testhouse
>  I did "A6,house,"  -> testhouse
>  I did "A7,house,"  -> testhouse
>  I did "A8,house,"  -> testhouse
> I did "A9,house,"  -> testhouse

This is the intended and documented behavior (see the updated doc/RULES).

>  :-/ why I don't get  :
> 
>   A5,house,"  -> test house
>   A6,house,"  -> test  house
>   A7,house,"  -> test   house
>   A8,house,"  -> test    house
>   A9,house,"  -> test     house

The command does not do that, and I doubt that it would be a good idea
to redefine it like that.  Why would you need this?  I don't think
passphrases with multiple spaces between words are common, and if you
want to be inserting just one space character, you can easily include
that character explicitly.

>  An other thing, still with An"STR" command,
> 
>  I would like generate testhouse, Testhouse, TestHouse, test_house, 
> test.house, test1house etc... with only 1 code line. What is the 
> fastest way to do that ?

With the current version of JtR, encoding all of this on one line yet
avoiding duplicates is not pretty.  The preprocessor works great when we
need to try all possible combinations - e.g., lowercase and append one
of three characters, then also capitalize and append one of the same
three characters - 6 combinations total.  It is not that convenient to
use when only a subset of the combinations needs to be tried.  In fact,
before version 1.7.4 that might have been impossible to do.  With 1.7.4,
you can achieve this via the new "parallel ranges" feature, additionally
relying on its undocumented and subject-to-change behavior when the
ranges in question are of different size.

Specifically, your desired list of combinations includes "don't insert a
character", "insert an underscore", "insert a dot", etc.  There's no way
to represent the "don't insert" case in a list of characters - it has to
affect the commands, not just command parameters - or we have to specify
a magic character that would get deleted later.

For example, this line:

: [lc] Az"[ :_.1][hH]ouse" @:

achieves your desired effect for the word "test".  However, it has a
number of drawbacks:

1. It uses the colon as a magic character, which is then deleted.  If a
wordlist happens to include some colons, they would get deleted as well.

2. Searching for and potentially deleting a character that we've just
inserted is suboptimal.

3. It assumes that the results of "l" (lowercase) and "c" (capitalize)
will differ.  If an input "word" does not start with a letter, then
duplicates will be produced.

4. It assumes a case-sensitive target hash type (e.g., raw MD5 or NTLM,
but not LM), or it will effectively produce duplicates (75% of total).

5. It assumes no length limit on passwords supported by the target hash
type, or it might effectively produce duplicates.

6. It requires that you turn your word "house" into "[hH]ouse" when you
generate the ruleset.  It could be convenient to be able to surround the
word written as-is with some rule commands and to have the ruleset line
try the appended word in both forms.

To fully address #1 and mitigate #2, we can change the line to:

: [lc] Az"[ :_.1][hH]ouse" \p2[:D]\p2[:l]

This uses the undocumented and subject-to-change behavior of "parallel
ranges" that I referred to above.  Specifically, when the inserted
character is other than the second one, the rule will end in "::" (two
no-op commands, which, by the way, no longer have a performance cost).
However, when the second character is inserted, then the rule ends in
"Dl" (delete the character at the initial length).  Please note that
this line no longer makes any assumptions about the character being
inserted and deleted; it is only referred to by its position number.

I am likely to change the behavior in a future version of JtR such that
the above would need to be written as:

: [lc] Az"[: _.1][hH]ouse" \p2[D:]\p2[l:]

which I think would be more natural.  Since this change of behavior is
not in place yet, I will continue to assume the current behavior in the
rest of this message.

To fully address #3 and mitigate #4, we can do:

-[:c] \p1[lc] (?\p1[za] Az"[ :_.1][hH]ouse" \p4[:D]\p4[:l]

This will use "(?a" (reject unless starts with a letter) along with "c"
(capitalize), but "(?z" (effectively a no-op, but a non-free one
unfortunately) along with "l" (lowercase).

Additionally, it will use the "-c" rule reject flag (reject the rule
unless the target hash type is case-sensitive) along with the "c"
command (capitalize).  Thus, the number of effective duplicates for
case-insensitive hashes is reduced from 75% to 50%.  The remaining
duplicates will be because of "house" vs. "House", which is currently
not reasonable to address within just one ruleset line.  We could do:

-[:c] \p1[lc] (?\p1[za] Az"[ :_.1]\p1[hH]ouse" \p4[:D]\p4[:l]

which avoids the remaining duplicates for case-insensitive hashes, but
it also reduces the number of candidate passwords for case-sensitive
hashes - not all lowercase/capitalize combinations for the two words are
tried.  Maybe this change is desirable, maybe not.

To address #5, we can simply add a length check:

-[:c] <* \p1[lc] (?\p1[za] Az"[ :_.1][hH]ouse" \p4[:D]\p4[:l]

This rejects input words that leave no room for the addition of at least
one character.

To address #6, we can use one of:

-[:c] <* \p1[lc] (?\p1[za] Az"house" T[zl] il[ :_.1] \p5[:D]\p5[:l]

or:

-[:c] <* \p1[lc] (?\p1[za] $[ :_.1] val1 Az"house" T[zl] \p4[:D]\p4[:a]

or:

-[:c] <* \p1[lc] (?\p1[za] $[ :_.1] val1 Az"house" [:T]\p5[:l] \p4[:D]\p4[:a]

All of these include the added word as-is, making it easy to generate
such a ruleset, but there's a performance impact.  Of these three, the
last one is the fastest (at least for a specific build of JtR 1.7.4 on
my computer), because the "$" and "v" commands are cheap compared to "i"
and because the "T" command is only generated (by the preprocessor) when
it is needed.

I have some thoughts on improving the rules engine further to make it
easier to write compact yet complicated ruleset lines like this and to
make them more efficient.  There are two major approaches to choose
from: fully rework the preprocessor such that it deals with substrings
rather than characters (to make it possible to write shell-like
expressions - e.g., "\p1{,(?a}" instead of "(?\p1[za]") or enhance both
the preprocessor and the rule commands interpreter in multiple minor
ways to address specific shortcomings.  I've started with the latter in
1.7.4, and I am likely to proceed that way for now.

For example, lists of characters with repeats could be a way to help
avoid duplicates with case-insensitive hashes in the above examples -
we could write "-\r[:ccc]" followed by "\p1[lclc]" and "\p1[hHHh] to
specify the four combinations explicitly such that we're able to exclude
three of the four with case-insensitive hashes.

Another enhancement that would be handy for the above is an advanced
no-op command - nullify/skip next N commands.  With the current code,
this one is reasonably easy to implement such that it would actually
remove itself along with the next N commands, thereby eliminating their
performance impact.  Then, for example, "(?\p1[za]" could be written as
"N\p1[10] (?a", which would eliminate the runtime overhead currently
incurred in the "(?z" case.  (Detecting and optimizing out all of the
"z" and "Z" class checks is trickier.)  More importantly, the append
character command could be easily skipped, eliminating the need for
deleting the character when we did not want to insert one.

Any comments are welcome.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.