john-dev - Re: Bit slice implementation of DES based hashes

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111222195526.GA16823@openwall.com>
Date: Thu, 22 Dec 2011 23:55:26 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Bit slice implementation of DES based hashes

On Thu, Dec 22, 2011 at 11:31:14PM +0530, piyush mittal wrote:
> > Also, do you understand why the get_hash() functions process a few
> > initial elements of B[] only?
> 
> No this I am not getting.Why here only B[0] have taken?

Not only B[0] is taken, but several initial elements of B[] are - from 4
for DES_bs_get_hash_0() to 27 for DES_bs_get_hash_6().  (I am referring
to 1.7.9's revision of these.)

> Is here just 1st bit of each hash is checking??

JtR has several algorithms to check whether the computed hash matches
one of those loaded for cracking or not.  Which one of these algorithms
is used depends on the number of loaded hashes with a given salt.

A straightforward approach would be to have a get_hash() function that
would for a given index (bit layer if we're talking bitslice) return the
full computed hash value.  For DES-based hashes, this function would
typically return a 64-bit value.  With a bitslice implementation, this
value would be based on all elements of B[] (with one bit taken out of
each element).  JtR would then compare this value against each hash
loaded for cracking (or each with the current salt, if applicable).

But this would be inefficient, so JtR does not do it.  Instead, when the
number of loaded hashes (which share a given salt) is small, it calls
cmp_all() to compare one of the loaded hashes against all recently
computed hashes.  For example, a bitslice implementation on a machine
with 128-bit SIMD vectors (e.g., x86-64 with SSE2) may compute 128
hashes at a time.  The corresponding cmp_all() function compares one
loaded hash against all of the 128 computed hashes in fewer than 128
iterations.  On x86-64 it does this in 12 iterations on average (this is
2*log2(64)).

Now, when the number of loaded hashes is large (for the current salt, if
applicable), even the cmp_all() algorithm is not good enough.  We'd have
to be calling this function for each loaded hash, which would be
time-consuming.  Thus, one of the get_hash*() functions is called for
every computed hash instead, returning partial hashes (4-bit to 27-bit
in version 1.7.9).  This function's return value is used as array index
to identify the hash bucket that might contain the computed hash (if we
have a full match).  Typically, the get_hash*() function and the hash
table size to use are chosen such that very few loaded hashes share a
hash bucket, if at all.  In fact, some hash buckets may be empty (in
those cases, it is instantly determined that a computed hash does not
match any of the hashes loaded for cracking).

If the number of hashes computed at a time is N (128 in the example
above) and the number of hashes loaded for cracking that share a given
salt is M, then complexity of the two algorithms is roughly:

log2(N) * M for the cmp_all() algorithm
N for the algorithm involving get_hash*()

As you can see, the former is preferable for small M and the latter for
large M.

In either case, if the algorithm indicates a likely match, it might not
tell us which of the loaded hashes was cracked and what the
corresponding index is (such that we can retrieve the corresponding
plaintext password that was tested).  Additionally, partial hashes may
be used with these algorithms (e.g., 32-bit), which is why I wrote
"likely".  Thus, in those relatively infrequent cases, further checks
are done with cmp_one() and cmp_exact() to confirm a match and to
identify which hashes got cracked (yes, more than one may get cracked
with one set of hash computations).

Do you see why get_hash*() do not need to return the full hash values?

> and Does DEPTH here represents total number of words present??

No.  DEPTH is a macro that may expand either to empty string or to
[depth].  The latter is used when we're dealing with SIMD vectors larger
than machine word size - e.g., 128-bit with SSE2, whereas the native
word size is just 64 or 32 bits.  In get_hash*() functions, we need to
extract some bits from just one bit layer.  We do this with regular
(non-SIMD) operations and machine words (in fact, even these are wider
than necessary since we're dealing with just one bit layer in these
functions).  Thus, we split the bit layer index into two components:
native machine word "depth" (which word in a SIMD vector we're dealing
with currently) and bit position in that word.  You can see this in the
init_depth() macro:

#define init_depth() \
	depth = index >> ARCH_BITS_LOG; \
	index &= (ARCH_BITS - 1);

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.