john-users - Re: Rules characters unicode support.

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20201110220621.GA14993@openwall.com>
Date: Tue, 10 Nov 2020 23:06:21 +0100
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Rules characters unicode support.

On Tue, Nov 10, 2020 at 05:01:24PM +0100, François wrote:
> I've just finished writing the john.conf using your micro-optimization
> trick.
> 
> Three last questions before creating a pull request:

Great!

> 1- On my experimental file I'm working on, this rule is surprisingly
> effective (hundreds of pass cracked), however, I specifically does
> not have uppercase in my sample, so my john.conf change just
> contains lowercase utf-8, do you want me to add uppercase?

It will be most flexible to have lowercase and uppercase as two separate
sections, then a section .include'ing both of those, and then have the
latter .include'd from [List.Rules:Jumbo].  That way, lowercase-only can
also be run by requesting just the corresponding ruleset.

> 2- Correct me if I'm wrong but there are no obvious search and
> replace strategy for any pattern of more than one letter in john rules
> engine; I'm thinking two-letter substitution to one unicode,
> specifically:
> # Latin small letter thorn (th) -> þ
> # Latin small letter ae -> æ

There's no way to search for a two-character substring, but you can
search for the first character and then check the second:

/a Dp =pe

Unfortunately, if the very first "a" isn't followed by an "e", this will
reject the word instead of searching further.  You can partially
compensate for that by also having:

%2a Dp =pe

and so on.  Of course, you'll need to follow these with commands that
introduce the UTF-8 characters at position "p".

Instead of the "D" command, you can have the rule calculate p+1 and
check the character there, or search for the second character and then
check the first at p-1 (fits the rule commands better, since adding 1
requires putting -1 into a variable first):

/e vap1 =aa
%2e vap1 =aa

This is likely quicker when the remaining portion of the word is long.
It's also better if your UTF-8 character is 2 bytes: so you just do two
overstrikes.

I didn't test any of these now, but they should work.

> 3- Do you want me to provide the rules in a best-match order,
> it might get a bit confusing, I can group by best unicode substitution
> order.

I have no preference, and I don't know what you mean by "best unicode
substitution".  I suspect these rules will usually be used as part of
the jumbo ruleset, in which case their number will be relatively small
and thus their order won't matter much.

However, I think "best-match order" is valuable if you have that data.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.