john-dev - Re: [patch] optional new raw sha1 implemetation

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120619130344.GB16702@openwall.com>
Date: Tue, 19 Jun 2012 17:03:44 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: [patch] optional new raw sha1 implemetation

Tavis,

On Tue, Jun 19, 2012 at 09:48:06AM +0200, Tavis Ormandy wrote:
> I noticed that one of the problems with setting it too high was that
> John calls cmp_one for every hash after a successful cmp_all. cmp_one
> really hurts, so if max_keys_per_crypt is too high, I lose some of the
> benefit to the overhead when there is a partial match.

While theoretically true, this issue should be of no practical relevance.
A typical setting of max_keys_per_crypt for a CPU-only format (no GPU)
may be hundreds or at most a few thousand.  (Larger values result in the
candidates not fitting in L1 data cache, which usually hurts.)
A partial binary ciphertext that you may be comparing against in
cmp_all() is typically 32-bit.  This means that you have 4 billion
possible values for it, and you're only comparing up to a few thousand
of computed hashes against it.  So you get false positives (resulting in
calls to cmp_one() for every computed hash) with a frequency of roughly
up to 1 per million of cmp_all() calls.  When that happens, you have
to call cmp_one() up to a few thousand times, but that's equivalent to
roughly one call per one thousand of hashes computed.  If cmp_one()
costs roughly the same as one hash computation (and usually cmp_one() is
cheaper), the performance impact is thus around 0.1%.

Now, with multiple hashes loaded for cracking (per salt if applicable)
and processed with the same cmp_all() approach, things get worse (times
the number of hashes with that salt - OK, no salt in this case).
However, this approach is only used for very small numbers of hashes.
magnum-jumbo has PASSWORD_HASH_THRESHOLD_0 at 3, meaning that this
approach is used for 1 or 2 hashes only (per salt, when applicable).
With 3 or more, cmp_all() should not be called at all.

> I noticed an easy way to fix it is just to check if get_hash() ==
> binary_hash() first

This sounds very wrong to me.  When get_hash() is worthwhile to use (as
determined by the thresholds), John will use it instead of cmp_all() on
its own - with no magic needed on your part.

Perhaps there's some context to this that I am missing, though?

Thanks,

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.