john-users - Re: Cracking stats: p/s, c/s and C/s. Hashing cost factors.

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF5WNqm8Tqyc+yJpR5fOyib6iHmTNRJXWkVA5-9s6tQ9ffxddg@mail.gmail.com>
Date: Wed, 5 May 2021 00:11:14 -0700
From: David Sontheimer <david.sontheimer@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: Cracking stats: p/s, c/s and C/s. Hashing cost factors.

Alexander,
Apologies for the slow response time. I have been exploring
iterative cracking mode, and the longer run times that come with it.

These ratios make more sense now, but in absolute terms these speeds in
> the billions look unrealistic for the iterated hashes you previously
> mentioned you were testing with.  So I guess something else is still
> wrong in your analysis script.
>

Those speeds were unsalted raw MD5. These are --test benchmarks for the
hashes I mentioned previously:

Benchmarking: md5crypt, crypt(3) $1$ (and variants) [MD5 256/256 AVX2
8x3]... (72xOMP) DONE

> Many salts: 2726K c/s real, 37798 c/s virtual
Only one salt: 2609K c/s real, 36303 c/s virtual

Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X3]...
(72xOMP) DONE

> Speed for cost 1 (iteration count) of 32
Raw: 43416 c/s real, 607 c/s virtual


> Benchmarking: sha512crypt, crypt(3) $6$ (rounds=5000) [SHA512 256/256 AVX2
4x]... (72xOMP) DONE

> Speed for cost 1 (iteration count) of 5000
Raw:    43885 c/s real, 611 c/s virtual

Benchmarking: sha1crypt, NetBSD's sha1crypt [PBKDF1-SHA1 256/256 AVX2
8x]... (72xOMP) DONE

> Speed for cost 1 (iteration count) of 40000
Raw:    20132 c/s real, 280 c/s virtual

Benchmarking: argon2 [Blake2 AVX]... (72xOMP) DONE

> Speed for cost 1 (t) of 3, cost 2 (m) of 4096, cost 3 (p) of 1, cost 4
(type [0:Argon2d 1:Argon2i]) of 0 and 1
Raw: 3291 c/s real, 46.8 c/s virtual

This is all on an older rack of 72 Xeon E5s.

I was able to change sha1crypt to a single 40k iteration count - much
appreciated - this will be a bit simpler to explain to my audience.

You can lock Argon2 benchmarks to either Argon2d or Argon2i with:
> ./john --test --format=argon2 --cost=0,0,0,0:0
> or:
> ./john --test --format=argon2 --cost=0,0,0,1:1
> The rest of Argon2 parameter values are: t=3, m=4096, p=1.


Is there documentation on the format of the --cost arguments for Argon2,
bcrypt, sha256_crypt, and sha512_crypt? I'd like to run benchmarks with the
costs matching the lowest that Passlib allows. For example, Passlib allows
a minimum of 1000 rounds for sha512_crypt, while John defaults to 5000.
When I attempt to alter the cost for these hashes it appears to either not
effect the result, or I receive an error:

FAILED (--cost pruned all 8 test vectors)


> Moreover, I guess your reference to [rainbow tables] was colloquial rather
> than
> literal - you probably meant a precomputed table in general, not a
> rainbow table specifically.  However, the answer would be no anyway.
>

Yes, colloquial. This is helpful to understand.

As I had mentioned, JtR does reuse the hashes it computes for multiple
> comparisons against possible multiple loaded hashes sharing a salt.
> However, this does not require maintaining any optimized nor long-term
> table of the computed hashes, except for having the set of just-computed
> hashes in memory.  In the simplest case, you simply have one computed
> hash at a time and you loop over all loaded hashes that have the same
> salt.  So you only store one currently computed hash in memory.  In the
> more complex case, which is what JtR typically has, multiple hashes are
> computed in parallel, and the current batch of computed hashes for the
> current salt (but different candidate passwords) is then mass-compared
> against the loaded hashes sharing this salt.  This mass-comparison
> operation is optimized in some ways, including through having
> performance-efficient data structures (bitmaps and hash tables) built
> out of loaded hashes (but not out of just-computed hashes).
>

For example, if I use 72 forks, but only load ten password hashes (all
sharing one uniform salt) then does John generate a batch of candidates
across forks, and mass-compare against all ten loaded hashes... then repeat
until matching candidates have been found for all ten loaded hashes?


> As a special case, JtR may omit duplicate hashes on loading, in which
> case the guess count and g/s reported while cracking will be limited to
> those hashes it had loaded.  "john --show" may then show a larger number
> of guesses (including duplicates that might have been omitted before).



> Given the hash types you're working with, you might face the first
> special case (as you appear to deliberately play with reused salts), but
> probably not the second.
>

This has certainly happened, and was bugging me for a while. Thank you.

>

> But we don't say "if a crypt matches ..." or the like.  We say "if a
> computed hash
> matches ..."
>

> Therefore, I should not refer to c/s as crypts tested, but crypts
> > generated, correct?
>
> This could have been correct, but we say "hashes computed".
>

Ok, many thanks. That's a helpful start on building my glossary.

I'm now cracking randomly generated passwords of short lengths, consisting
of only digits. I'm using iterative mode with both min-len and max-len
constrained by the known length of the cleartext (eg: if cleartext_len = 4,
then min-len=4 and max-len=4), with an alphabet of "Digits."

It appears that without min-len equivalent to max-len, John has internal
logic to begin with common password lengths. But with these constraints, in
what order does iterative mode generate candidates by default? I  imagine
John could generate candidates by iterating through the bounded
search-space by unicode value, regardless of number, letter or special
character. Character frequency analysis? Something else? Not a random seed,
no?

If there is complex logic in the generation order, is this contained within
the config file?

Similarly, how does John generate candidates from a wordlist, or mask +
wordlist?

Forest from the trees - I'd like to better understand how to relate my
theoretical time-to-crack calculations with the actual runtimes I'm seeing
from John.

-David
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.