john-dev - Re: Birthday paradox

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230423110803.GA17261@openwall.com>
Date: Sun, 23 Apr 2023 13:08:03 +0200
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Birthday paradox

On Sun, Apr 23, 2023 at 12:28:27PM +0200, magnum wrote:
> Oh, I see now I need longer tests because of the speed involved. At 
> 30Gp/s a nano-second timer indeed *is* low resolution. Still, the 
> difference is very small. I now bumped a 2.5 second test to 25 seconds 
> for the below.  Difference still is small and shows "steps" in reported 
> speed.
> 
> Starting without a bitmap:
> 
> $ for num in 256 ; do head -$num scraped-bare.in > test.$num.in && for k 
> in 0 1 2 3 4 5 6 ; do rm -f ../run/john.pot && BLOOM_K=$k 
> BITMAP_SIZE=$(($num*64)) ../run/john -form:nt-opencl -mask:?a -len=6 
> test.$num.in -lws=512 -gws=34816 -v:1 || break 2 ; done ; rm 
> test.$num.in ; done
> 
> Starting with no bitmap (BLOOM_K=0 disables):
> 
> Using default input encoding: UTF-8
> Loaded 256 password hashes with no different salts (NT-opencl [MD4 OpenCL])
> 256 hashes: Hash table in local memory (2468 B); no filter.
> Offset tbl 380 B, Hash tbl 2088 B, Results 3076 B, Dupe bmp 36 B, TOTAL 
> on GPU: 5580 B
> LWS=256 GWS=34816 (136 blocks) x9025
> Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
> 25g 0:00:00:24 DONE (2023-04-23 12:04) 1.004g/s *30628Mp/s* 30628Mc/s 
> 7190GC/s Dev#2:66°C util:95% aaUh}|..aaUh}|
> Remaining 231 password hashes with no different salts

This isn't 256 hashes.  Maybe your other tests are not, either?

> Session completed.
> 735091890625 crypts, fp from bitmap: 1/4153061529 (0.00%), 177 hash 
> table lookups

735091890625/24/10^6 = 30628.8

Looks like the low resolution timer is what we use for the figures on
the status line, and it has 1-second resolution in your tests.  On new
sessions, we normally initially use clock ticks in that computation:

	use_ticks = (c <= 0xffffffffU && !c_ehi && !status_restored_time);

However, in this case the speeds are so high that we disable this early.

I think we should switch to using floating-point inside that function so
that we do not have to explicitly lower precision to avoid overflows
(but rather this would happen implicitly in much finer-grained steps and
would maintain greater precision of the result).  In other words, let's
continue to use wide integers (up to 96-bit like we use now) to count
things (so that any precision loss doesn't accumulate), but use "double"
within the rate calculations.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.