Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 16 Jan 2012 18:49:09 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitmaps

On Mon, Jan 16, 2012 at 06:25:26PM +0400, Solar Designer wrote:
> On Mon, Jan 16, 2012 at 03:00:35AM +0400, Solar Designer wrote:
> > hashes	c/s	VSZ	RSS
> ...
> > 1M	13M	121 MB	115 MB
> > 10M	5.7M	1046 MB	1039 MB
...
> probably not much parallelism for RAM accesses (it's more like
> pipelining).

This got me thinking: what stops me from similarly scheduling multiple
RAM accesses from a single thread?  So I tried the following sequence:

1. Extract hash2 for index+1.
2. Extract hash1 for index.
3. Do bitmap lookup 1 for hash1, but don't use the value yet.
4. Do bitmap lookup 2 for hash2, but don't use the value yet.
5. Use the result of bitmap lookup 1.
6. Use the result of bitmap lookup 2.

This got me to 7.5M c/s for one thread (non-OpenMP build) at 10M hashes,
as measured at the same time point as the 5.7M above.  For a longer
running session, this reduces to 6.6M c/s presumably as less similar
passwords are tried (lookups against the charsets in inc.c throw more of
the bitmap out of cache?)  At 1M hashes, the speedup is very small
(since the bitmap fits in L2 cache here).

This is not committed yet - it's a quick hack for testing only.

Somehow moving the hash2 extraction to between bitmap lookup 1 and the
use of its result made the speed worse.  My guess is that this has to do
with the compiler having to save the register (result of bitmap lookup 1)
on stack for the hash function call, where it effectively "uses" the
lookup result (even if just for saving it for later use).

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ