john-dev - Re: fast hash processing bottlenecks (was: ldr_split

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150916193135.GA12611@openwall.com>
Date: Wed, 16 Sep 2015 22:31:35 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: fast hash processing bottlenecks (was: ldr_split_line() performance regression)

On Wed, Sep 16, 2015 at 09:11:53PM +0300, Solar Designer wrote:
> For 1 hash, it's 15 seconds.  Even for 1 hash, the c/s rate is only 1/3
> of what we see on --test benchmarks for raw-md5 (~7M out of ~21M per
> core on this 2x E5420).  So 2/3 is wordlist mode and rules and lookups
> overhead (but for only 1 hash it's clearly mostly not lookups).

For 1 $dummy$ hash (not raw-md5), I am getting ~9M c/s per process for
the same wordlist and rules, also with --fork=8.  (And mmap is slightly
faster than pre-loading.)  Curiously, when running just 1 process (no
fork), it gives ~10.5M c/s.  This old machine runs at 2.5 GHz regardless
of number of active cores, and there's no HT.  Thus, the slowdown per
process with 8 processes probably comes from higher cache miss rate
or/and competition for RAM access.

Maybe we need to implement SSE2 prefetching for wordlist entries from
the memory buffer (whether mmap'ed or pre-loaded).  Right now, jumbo's
wordlist.c is too much of a mess for me to try doing this to it, but
maybe we should make this one of the goals for the rewrite that magnum
mentioned.

A "rules first" mode would help greatly (apply each rule to current
word, then advance to next word; right now, we do it the other way
around, which works better for probability-optimized rulesets and slow
hashes).

The potential speed for the current raw-md5 code on this machine is:

Benchmarking: Raw-MD5 [MD5 128/128 SSE4.1 4x3]... DONE
Raw:    21811K c/s real, 21811K c/s virtual

per core.  But we don't reach anywhere near it with wordlist mode.
We do reach 21M+ per core with --fork=8 e.g. in mask mode.  20M for
incremental mode locked to length 8.  Also, 19.7M for wordlist+mask
(tested with -mask='?w?a?a').  But not for wordlist+rules.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.