Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 18 Apr 2023 21:44:32 +0200
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Birthday paradox

On Tue, Apr 18, 2023 at 07:29:36PM +0200, Solar Designer wrote:
> On Tue, Apr 18, 2023 at 07:12:09PM +0200, magnum wrote:
> > Background: I was amazed how an 4-level bitmap that "should" give 1/16 
> > false positives, could end up as good as 1/43.  That was with 512 hashes 
> > in a 4x1024 bitmap. Using this formula we end up with 1/41, so mystery 
> > solved.
> > 
> > In another test case, 1024 hashes in a 8x1024 bitmap would be "full" and 
> > should give 1/1 false positives if we don't factor in the birthday 
> > paradox.  But empirical tests showed an amazing 1/38 - actually good 
> > enough to make an nvidia perform near its best!  This was again 
> > explained with the birthday paradox (which says 1/39).
> 
> Now you need to compare efficiency of these multiple bitmaps against one
> Bloom filter of their cumulative size.

One Bloom filter should have lower false positive rate per size than
multiple bitmaps do, however it may require more lookups to achieve such
lower FP rate.  Typical formulas for optimal Bloom filter parameters
optimize solely for lowest FP rate, ignoring the cost of greater number
of lookups.

> And then against a Cuckoo filter of the same size.

This is the best of both worlds: not only even lower FP rate per size
than Bloom filter, but also at most 2 lookups.

However, perfect hash tables, which we already use as the final level in
the code that you refer to, are an even further step in the same
direction, needing exactly 1 lookup.  Given that we already went for the
complexity there, is there a reason we're not doing the same for the
smaller early-reject tables in local memory?  Just fill them with tiny
fingerprints like a Cuckoo filter would use.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.