|
|
Message-ID: <20150913140456.GA1718@openwall.com>
Date: Sun, 13 Sep 2015 17:04:56 +0300
From: Aleksey Cherepanov <lyosha@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: experiment with functions to reject computed hashes
On Sun, Sep 13, 2015 at 04:06:20PM +0300, Solar Designer wrote:
> On Sun, Sep 13, 2015 at 03:34:09PM +0300, Aleksey Cherepanov wrote:
> > I was curious if it is possible to make a function to reject computed
> > hashes useless for us. Such function has to:
> > - never reject hashes from given set,
> > - reject as many as possible other hashes.
>
> We're already doing that with bitmaps and hash tables, and just
> yesterday I posted on other related ideas. How is your idea different?
> Do you mean rejecting before hash computation is finished? Even if so,
> we already have the crypt_all()/get_hash*()/cmp_all()/cmp_one() vs.
> cmp_exact() split. Do you mean checks that are not effective enough
> (don't eliminate enough candidates) to postpone them until after
> crypt_all() returns, so that you want to do them inside crypt_all()
> before proceeding to more effective (yet also non-final) checks?
The tables are aimed to cut down some hashes before bitmap. They are
not aimed to replace bitmap.
I had this idea earlier, so thread about Judy arrays just triggered
experiments.
> > So generated function can reject almost half of candidates using ~7
> > sse instructions (not counting load of tables but counting final
> > &0x0..0ff), cracking 32 hashes.
>
> Isn't it more effective to go for bitmap lookups right away?
Could not it be faster with higher number of hashes?
I guess size of vectors matter much more.
> If cracking only 32 hashes, we can have the bitmap small enough that
> it's in L1 cache yet is very effective. Of course, it'd be 4 lookups if
> cracking 4 hashes per 128-bit SIMD vector. Is your "7 instructions" for
> all 4? Even if so, it's not 7 vs. however many we'd need for bitmap
> lookups, but rather it's 7 plus at least 50% of the effort that we
> since your filter is less than 50% effective as you say.
"7 instructions" is for the whole vector, not for each value in
vector. But probability is for separate values.
Having 50% probability and 4x vector, full reject would occur only in
1/16 cases. 85% and 4x vectors would give full rejection in 50% cases.
So with 85% and 4x vector, it would be 1 check in 50% and 1 check + 4
lookups in other 50%. Trickier code might separate probabilities.
So I think there are a few cases when the tables might be useful:
- if reject ratio is close to precise, to replace bitmaps,
- if lookups are very slow,
- if check cost + (1 - reject probability) * vector size * lookup cost <
vector size * lookup cost
(and that's all without respect to time of building tables.)
> And you say you're "not counting load of tables" (constants into SIMD
> registers?), which is going to cost too (we usually don't have spare
> registers across the entire hash computation).
The tables are not really constants, they should be computed at moment
when all hashes are loaded. I guess it would be 1 movdqa per table.
> I think most speedup when cracking few hashes is possible through
> reversing of steps, which we're not doing yet. Until we do that, it's
> premature to optimize hash comparisons for that same special case.
My tables does not interfere with reversing (like bitmap do not
interfere). I think it is a separate question. But faster hashes would
benefit from this mostly because slower hashes spend most time
computing (and they are probably salted, so checks against sets of 1
hash are more expectable). So yes, there are other, more promising
ways to improve fast hashes.
Thanks!
--
Regards,
Aleksey Cherepanov
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.