john-dev - Re: eliminating barrier between crypt_all() and hash comparison/lookups

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120625045617.GA8280@openwall.com>
Date: Mon, 25 Jun 2012 08:56:17 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: eliminating barrier between crypt_all() and hash comparison/lookups

On Sat, Jan 21, 2012 at 09:41:25PM +0400, Solar Designer wrote:
> On Mon, Jan 16, 2012 at 06:25:26PM +0400, Solar Designer wrote:
> > Some further OpenMP speedup may be possible through eliminating the
> > implicit barrier between crypt_all() and the bitmap lookups loop, but
> > this is tricky to implement while preserving the generic program
> > structure (not specific to a hash type).  Well, maybe crypt_all() could
> > use nowait and atomic writes to some flags (as well as write barriers),
> > and the lookups loop or rather get_hash*() could spin when it sees a
> > flag that is not yet set (after a read barrier, although this should not
> > matter on x86).
> 
> I looked into this possibility.  No, it won't work without changing the
> program structure significantly because nowait may only be used within a
> parallel region, which has to be closed (and thus an implicit barrier
> introduced) before the end of the function (crypt_all() in this case).

Actually, the function holding the parallel region does not have to be
crypt_all() itself.  It can as well be the caller function in cracker.c.
Then crypt_all() would use "#pragma omp for nowait" without the
"parallel" keyword.

Maybe a format could use a new FMT_* flag to tell cracker.c that its
crypt_all() (and cmp_all() and get_hash*() maybe?) should be called from
within a parallel region in the caller.  Then this would fit in the
existing formats interface.

The attached standalone SHA-512 cracker demonstrates this approach.
It runs slightly faster for me with "nowait" than without.  There's some
overhead to maintain the computed hash revision numbers, though - when I
remove those, it runs faster yet, but then it's unreliable.  (To actually
trigger the problem in that case, reduce KEYS_PER_CRYPT to be same as
thread count and use the reverse order loop in cmp_all() - I wrote that
line specifically to make it more likely to hit not-yet-computed hashes.)

Better efficiency may be achieved by having per-thread rather than
per-hash flags.  However, then we have to allocate key/hash indices to
threads manually (not rely on OpenMP to do this magic).

So we do in fact have the "nowait" alternative to the other non-pretty
approach, which I described below:

> What we could do is move (or duplicate) some logic from cracker.c,
> cmp_*() and get_hash_*() into crypt_all().  The new crypt_all() would
> then need to accept e.g. a pointer to struct db_salt, so it would check
> if there are possibly any guesses or not before returning control to
> cracker.c (which would then identify the specific guessed passwords with
> a loop much like it does now - but only if crypt_all() returned that
> there might be any).  Unfortunately, this mixes up different layers of
> the program.  While it is no surprise that mixing them should improve
> performance at fast hashes, it is bad for code readability and further
> maintenance.  Also, if implemented the obvious way, we may even get
> circular #include's between formats.h and loader.h.  Merging these into
> one file would probably go against source code readability even further.
> 
> So I haven't decided what to do on this yet.

Alexander

View attachment "nowait.c" of type "text/plain" (2384 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.