john-dev - Re: fast hash processing bottlenecks

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150917042730.GA15068@openwall.com>
Date: Thu, 17 Sep 2015 07:27:30 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: fast hash processing bottlenecks

On Wed, Sep 16, 2015 at 09:37:54PM +0200, magnum wrote:
> I think the rules engine has some low hanging fruit

The best64 rules use repeated simple commands, instead of using some of
our advanced commands.  The attached patch optimizes our handling of
that.  With it, the best64 rules run faster, whereas our more typical
rulesets continue to run at their usual speed.

The 29M testcase is now down to:

real    0m43.621s
user    2m56.879s
sys     0m17.599s

It might make sense to further optimize this by having repeated $ and ^
commands translated into A commands when the rules are first parsed,
like we do for no-op squeezing.

A possibly more important optimization I didn't bother with yet is
avoiding the copying from in to out in more commands, and instead moving
the start of string pointer by a few chars.  I didn't implement this so
far because it's potentially problematic: a large number of rule commands
would then be able to move the pointer to outside of its buffer.  We'd
need to check whether the pointer is still within bounds, and this would
partially defeat the performance gain from the lack of copying.
Alternatively, we could make the string buffers so much larger than the
rule buffer that this issue would be impossible.  (The start of string
pointers would need to be placed in the middle of the buffers initially.)

Alexander

View attachment "john-repeated-rule-commands.diff" of type "text/plain" (3200 bytes)

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.