Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 13 Sep 2015 17:04:56 +0300
From: Aleksey Cherepanov <lyosha@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: experiment with functions to reject computed hashes

On Sun, Sep 13, 2015 at 04:06:20PM +0300, Solar Designer wrote:
> On Sun, Sep 13, 2015 at 03:34:09PM +0300, Aleksey Cherepanov wrote:
> > I was curious if it is possible to make a function to reject computed
> > hashes useless for us. Such function has to:
> > - never reject hashes from given set,
> > - reject as many as possible other hashes.
> 
> We're already doing that with bitmaps and hash tables, and just
> yesterday I posted on other related ideas.  How is your idea different?
> Do you mean rejecting before hash computation is finished?  Even if so,
> we already have the crypt_all()/get_hash*()/cmp_all()/cmp_one() vs.
> cmp_exact() split.  Do you mean checks that are not effective enough
> (don't eliminate enough candidates) to postpone them until after
> crypt_all() returns, so that you want to do them inside crypt_all()
> before proceeding to more effective (yet also non-final) checks?

The tables are aimed to cut down some hashes before bitmap. They are
not aimed to replace bitmap.

I had this idea earlier, so thread about Judy arrays just triggered
experiments.

> > So generated function can reject almost half of candidates using ~7
> > sse instructions (not counting load of tables but counting final
> > &0x0..0ff), cracking 32 hashes.
> 
> Isn't it more effective to go for bitmap lookups right away?

Could not it be faster with higher number of hashes?

I guess size of vectors matter much more.

> If cracking only 32 hashes, we can have the bitmap small enough that
> it's in L1 cache yet is very effective.  Of course, it'd be 4 lookups if
> cracking 4 hashes per 128-bit SIMD vector.  Is your "7 instructions" for
> all 4?  Even if so, it's not 7 vs. however many we'd need for bitmap
> lookups, but rather it's 7 plus at least 50% of the effort that we
> since your filter is less than 50% effective as you say.

"7 instructions" is for the whole vector, not for each value in
vector. But probability is for separate values.

Having 50% probability and 4x vector, full reject would occur only in
1/16 cases. 85% and 4x vectors would give full rejection in 50% cases.
So with 85% and 4x vector, it would be 1 check in 50% and 1 check + 4
lookups in other 50%. Trickier code might separate probabilities.

So I think there are a few cases when the tables might be useful:
- if reject ratio is close to precise, to replace bitmaps,
- if lookups are very slow,
- if check cost + (1 - reject probability) * vector size * lookup cost <
     vector size * lookup cost
(and that's all without respect to time of building tables.)

> And you say you're "not counting load of tables" (constants into SIMD
> registers?), which is going to cost too (we usually don't have spare
> registers across the entire hash computation).

The tables are not really constants, they should be computed at moment
when all hashes are loaded. I guess it would be 1 movdqa per table.

> I think most speedup when cracking few hashes is possible through
> reversing of steps, which we're not doing yet.  Until we do that, it's
> premature to optimize hash comparisons for that same special case.

My tables does not interfere with reversing (like bitmap do not
interfere). I think it is a separate question. But faster hashes would
benefit from this mostly because slower hashes spend most time
computing (and they are probably salted, so checks against sets of 1
hash are more expectable). So yes, there are other, more promising
ways to improve fast hashes.

Thanks!

-- 
Regards,
Aleksey Cherepanov

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.