```Date: Sat, 22 Apr 2023 18:21:40 +0200
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com

On Tue, Apr 18, 2023 at 07:12:09PM +0200, magnum wrote:
> \$ bc -l <<< '4096*(1-(4095/4096)^1024)'
> 906.12935699914074324992
>
> And that's even closer to my empirical data.
>
> Background: I was amazed how an 4-level bitmap that "should" give 1/16
> false positives, could end up as good as 1/43.  That was with 512 hashes
> in a 4x1024 bitmap. Using this formula we end up with 1/41, so mystery
> solved.

Yes, and using a Bloom filter you can improve this to 1/46 at k=6,
according to:

https://hur.st/bloomfilter/?n=512&m=4096

The question is then whether the two extra lookups are worth it, or if
the reduction in FP rate is so small that it's more efficient to go to
hash table in global memory after k=5 or k=4 (or even lower) anyway.
In this example, k=5 vs. k=6 have almost the same FP rate (k=5 has just
slightly higher), so k=5 is probably best for our needs.

I suspect what's optimal will vary by a lot of parameters (which GPU,
what hash table size, etc.)

So k becomes another tunable, maybe subject for auto-tuning, but should
always be below the threshold where FP rate starts to worsen (in the
above example, k=7 is worse).

> In another test case, 1024 hashes in a 8x1024 bitmap would be "full" and
> should give 1/1 false positives if we don't factor in the birthday
> paradox.  But empirical tests showed an amazing 1/38 - actually good
> enough to make an nvidia perform near its best!  This was again
> explained with the birthday paradox (which says 1/39).

This also improves to 1/46 at k=5 or k=6.

Alexander
```

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.