Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <90caeeccd43523489035376b43bc1e5c@smtp.hushmail.com>
Date: Tue, 25 Apr 2023 23:12:54 +0200
From: magnum <magnumripper@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Birthday paradox

On 2023-04-25 17:15, Solar Designer wrote:
> On Tue, Apr 25, 2023 at 04:24:19PM +0200, magnum wrote:
>> - I have yet to see the old bitmap code outperform the new in any
>> discouraging way.  It sometimes wins, but not by far and I'm nowhere
>> near finishing my "algo".  Endless testing ahead.
> 
> Cool.  Meanwhile, I recalled a drawback of Bloom filter vs. separate
> bitmaps: you can't remove items from a Bloom filter on its own (without
> host memory overhead to store extra info to allow for that, or
> regenerating).  I guess the current code didn't bother removing from the
> separate bitmaps anyway?

I'm afraid current formats rebuild the lot, totally unoptimized: 
Nothing happens until the remaining count has dropped by 10% (and only 
if more than 4500 remaining).  What happens then is a complete rebuild 
of *everything* from scratch including PHT, kernel build, new autotune 
I'm pretty sure, all memory buffers and bitmap with smaller sizes for 
everything, and obviously new bitmap parameters such as k.  Since the 
formats are FMT_REMOVE, cracked hashes get removed from the core 
database during re-population of the fresh PHT.  If we had 300M hashes 
and cracked 10%, this (particularly the new PHT) would obviously take 
quite some time and hit total p/s seriously.

I'll read up about cuckoo filters once I need some rest from my 
profiling.  Then I'm not sure whether I'll finish the current work to 
some degree, or just move on experimening with cuckoo instead.  It 
depends on how complicated it gets.  Perhaps I should commit something, 
as some kind of milestone.  Like, proper bloom filter but with mostly 
the old parameter algo (which still works fairly well - the few bumps 
where it is really suboptimal was there all the time).  Oh, and my 
current code drops sources, saving 5 bytes per hash on host, and keeps 
128-bit binaries on host side (and in bitmap, for k > 2).

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.