john-dev - Re: Birthday paradox

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230425225426.GA820@openwall.com>
Date: Wed, 26 Apr 2023 00:54:26 +0200
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Birthday paradox

Hi,

I guess others reading this might be surprised that magnum and I are
figuring out what the current code in JtR does.  That's because this
thread is specifically about code kindly contributed by Sayantan in
2015, which is used for a dozen of "fast hash" OpenCL formats.  That's
important, but it's not our main/default hash lookup/comparison code.

On Tue, Apr 25, 2023 at 11:12:54PM +0200, magnum wrote:
> On 2023-04-25 17:15, Solar Designer wrote:
> >Cool.  Meanwhile, I recalled a drawback of Bloom filter vs. separate
> >bitmaps: you can't remove items from a Bloom filter on its own (without
> >host memory overhead to store extra info to allow for that, or
> >regenerating).  I guess the current code didn't bother removing from the
> >separate bitmaps anyway?
> 
> I'm afraid current formats rebuild the lot, totally unoptimized: 
> Nothing happens until the remaining count has dropped by 10% (and only 
> if more than 4500 remaining).  What happens then is a complete rebuild 
> of *everything* from scratch including PHT, kernel build, new autotune 
> I'm pretty sure, all memory buffers and bitmap with smaller sizes for 
> everything, and obviously new bitmap parameters such as k.  Since the 
> formats are FMT_REMOVE, cracked hashes get removed from the core 
> database during re-population of the fresh PHT.  If we had 300M hashes 
> and cracked 10%, this (particularly the new PHT) would obviously take 
> quite some time and hit total p/s seriously.

Yes, I see this in the code now.  I'm also concerned that every kernel
rebuild can potentially break it, but we only self-test using a build
tuned for our test hash count.  We should probably inject at least one
test hash into the PHT used by the actually used build, and then again
if we rebuild during the run after having removed many hashes.

> I'll read up about cuckoo filters once I need some rest from my 
> profiling.  Then I'm not sure whether I'll finish the current work to 
> some degree, or just move on experimening with cuckoo instead.  It 
> depends on how complicated it gets.  Perhaps I should commit something, 
> as some kind of milestone.  Like, proper bloom filter but with mostly 
> the old parameter algo (which still works fairly well - the few bumps 
> where it is really suboptimal was there all the time).  Oh, and my 
> current code drops sources, saving 5 bytes per hash on host, and keeps 
> 128-bit binaries on host side (and in bitmap, for k > 2).

Committing a Bloom filter milestone makes sense to me.  Like I had said,
this would be our new baseline.

Then if I had time I'd probably proceed to reading that "paper 'Perfect
Spatial Hashing' by Lefebvre & Hoppe" referenced in Sayantan's comment.

I also notice that for PHT we actually perform two lookups currently:
first for offset table, and then for hash table.  I guess the offset
table is needed because it's not filled to 100%, so that less space is
wasted (as the hash table's entries are larger)?  Well, you'd probably
avoid that two-step process for PHT with tiny elements (fingerprints),
but if you were to keep it then it's worse than a cuckoo filter (where
it's at most two lookups, meaning it's sometimes one, and the fill is
98%, which I guess is higher than what we get for offsets here or else
Sayantan wouldn't have bothered with the indirection).

If we determine we can fully move to cuckoo filters with no performance
nor memory loss, then that should let us avoid per-hash-count kernel
builds, thus also avoiding the self-[re]test difficulty.

I had thought PHT let us do just one lookup, but if it's two anyway,
then there's little point in using PHT.  Oh, even worse - it's two
sequential lookups here (need to wait for the offset lookup to complete
before we can do the hash lookup), whereas with cuckoo we can opt to do
the two hash bucket lookups in parallel (although sometimes only one
will be needed), which may sometimes be faster (a tunable parameter).
OTOH, for some (medium) sizes the offset table would fit in a cache.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.