```Date: Tue, 18 Apr 2023 22:31:04 +0200
From: magnum <magnumripper@...hmail.com>
To: john-dev@...ts.openwall.com

On 2023-04-18 21:44, Solar Designer wrote:
> On Tue, Apr 18, 2023 at 07:29:36PM +0200, Solar Designer wrote:
>> On Tue, Apr 18, 2023 at 07:12:09PM +0200, magnum wrote:
>>> Background: I was amazed how an 4-level bitmap that "should" give 1/16
>>> false positives, could end up as good as 1/43.  That was with 512 hashes
>>> in a 4x1024 bitmap. Using this formula we end up with 1/41, so mystery
>>> solved.
>>>
>>> In another test case, 1024 hashes in a 8x1024 bitmap would be "full" and
>>> should give 1/1 false positives if we don't factor in the birthday
>>> paradox.  But empirical tests showed an amazing 1/38 - actually good
>>> enough to make an nvidia perform near its best!  This was again
>>> explained with the birthday paradox (which says 1/39).
>>
>> Now you need to compare efficiency of these multiple bitmaps against one
>> Bloom filter of their cumulative size.
> (...)
>> And then against a Cuckoo filter of the same size.
>
> This is the best of both worlds: not only even lower FP rate per size
> than Bloom filter, but also at most 2 lookups.
>
> However, perfect hash tables, which we already use as the final level in
> the code that you refer to, are an even further step in the same
> direction, needing exactly 1 lookup.  Given that we already went for the
> complexity there, is there a reason we're not doing the same for the
> smaller early-reject tables in local memory?  Just fill them with tiny
> fingerprints like a Cuckoo filter would use.

Assuming by "lookup" you mean the actual hash tables lookup, and by
"early-reject" you mean the bitmap... Are you saing we should build an
early-reject table more similar to the actual hash table?

I'm merely trying to understand all bits and pieces involved. The bitmap
is always "a single lookup", but it can consist of up to eight reads of
(hopefully local) memory:

---8<--------------8<--------------8<--------------8<-----------
uint bitmap_index = hash[1] & BITMAP_MASK;

uint tmp = (bitmaps[BITMAP_SHIFT * 0 + (bitmap_index >> 5)] >>
(bitmap_index & 31)) & 1U;
#if SELECT_CMP_STEPS == 2

tmp &= (bitmaps[BITMAP_SHIFT * 1 + (bitmap_index >> 5)] >>
(bitmap_index & 31)) & 1U;
#endif
#if SELECT_CMP_STEPS == 4
(...)
---8<--------------8<--------------8<--------------8<-----------

The BITMAP_SHIFT macro and the occasional multiplication are all
compile-time constants so should be optimized away. Then each bitmap
step is just an add, a couple shifts and a couple ANDs.

Here's the code for a hash table look-up once we passed the bitmap test:

---8<--------------8<--------------8<--------------8<-----------
uint t, hash_table_index;

ulong hash64;

hash64 = ((ulong)hash[1] << 32) | (ulong)hash[0];

hash64 += (ulong)offset_table[hash64 % OFFSET_TABLE_SIZE];

hash_table_index = hash64 % HASH_TABLE_SIZE;

if (hash_table[hash_table_index] == hash[0] &&

hash_table[hash_table_index + HASH_TABLE_SIZE] == hash[1]) {

/* we've got a crack */
---8<--------------8<--------------8<--------------8<-----------

Like we discussed recently, the modulo operations are compile-time
constants so should be fairly optimized - but perhaps they are still
relatively expensive?  I tried disabling the bitmap completely, to see
how bad it would be, and speed plummeted.  I've been meaning to re-try
that but with the offset and hash tables in local memory (when they fit
of course). But if I recall correctly, I even tried replacing the bitmap
lookup (for a single hash loaded) with a constant compare [if (hash[1] =
0xcafebabe)] and even *that* did not run any faster than the current
bitmap code. Not sure why really.

Anyway I've mostly been tinkering with "many" hashes and how to optimize
that. One idea I've had is for the case where eg. a 4x bitmap doesn't
fit in local memory: Perhaps the first (or first two) steps can be in
local, and the rest in global?

Well, many ideas, lots of basic research, slow progress.

magnum

```

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.