john-dev - Re: bitmaps

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120116142526.GA19440@openwall.com>
Date: Mon, 16 Jan 2012 18:25:26 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitmaps

On Mon, Jan 16, 2012 at 03:00:35AM +0400, Solar Designer wrote:
> hashes	c/s	VSZ	RSS
...
> 1M	13M	121 MB	115 MB
> 10M	5.7M	1046 MB	1039 MB

Curiously, these speeds improve to 33M and 21M, respectively, on 2xE5420
when I build with OpenMP and add the proper #pragma omp directive for
the bitmap lookups loop in cracker.c.  I only did this as a quick hack,
though - the real thing will need to differ in handling of correct
guesses (a critical section) - so I am not committing this yet.

I think these 2.5x and 3.7x speedups are pretty good considering that
even though the system has 8 CPU cores, it only has 4 L2 caches (IIRC,
it's 6 MB shared per each two cores, so 4x6 MB total in the system) and
probably not much parallelism for RAM accesses (it's more like
pipelining).  At these numbers of fast hashes, the overall speed is
mostly limited by bitmap and hash table lookups.

We could even do this sort of parallelization for the lookups for
formats where we don't yet have OpenMP parallelization in crypt_all(),
but where max_keys_per_crypt is large enough anyway.  The format will
need to indicate that its get_hash*(), cmp_one(), and cmp_exact() are
thread-safe by setting some flag, though.

Some further OpenMP speedup may be possible through eliminating the
implicit barrier between crypt_all() and the bitmap lookups loop, but
this is tricky to implement while preserving the generic program
structure (not specific to a hash type).  Well, maybe crypt_all() could
use nowait and atomic writes to some flags (as well as write barriers),
and the lookups loop or rather get_hash*() could spin when it sees a
flag that is not yet set (after a read barrier, although this should not
matter on x86).  We'd be betting that most of the time the flags would
be set in time and the spinning would not occur.  We might need to use
the ordered directive on the parallel loops to make this more likely.
These are just some ideas.  In general, the wisdom is that OpenMP is not
just for individual loops and we could start using it "for real" (beyond
per-format crypto code).

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.