john-users - Re: Cracking stats: p/s, c/s and C/s. Hashing cost factors.

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210414175030.GB2810@openwall.com>
Date: Wed, 14 Apr 2021 19:50:30 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Cracking stats: p/s, c/s and C/s. Hashing cost factors.

On Mon, Apr 12, 2021 at 02:41:48PM -0700, David Sontheimer wrote:
> Found and resolved a bug in my code. My stats make a lot more sense now, in
> relation to your explanations.
> 
> Total pwd hashes: 100
> Total unique salts: 1
> Pwds tested/sec: 1.489019e+09
> Crypts tested/sec: 1.489019e+09
> Combinations tested/sec: 1.490530e+11
> 
> C/c: 100.10147016257012
> c/p: 1.0
> p/c: 1.0
> C/p: 100.10147016257012

These ratios make more sense now, but in absolute terms these speeds in
the billions look unrealistic for the iterated hashes you previously
mentioned you were testing with.  So I guess something else is still
wrong in your analysis script.

> Is it accurate to interpret that JtR creates a rainbow table -

No, absolutely not correct.  Rainbow tables had their time (a few years
a couple of decades ago) and their uses for unsalted hashes, and
occasionally still find their uses in rare special cases, but are not a
good choice for most practical purposes, and were never used in JtR.
Moreover, I guess your reference to them was colloquial rather than
literal - you probably meant a precomputed table in general, not a
rainbow table specifically.  However, the answer would be no anyway.

> based on
> candidate passwords generated from rule set (and/or wordlist) - that
> includes crypts computed with a repeat candidate password for each salt
> encountered in the password file?

I don't know how exactly to interpret what you wrote, so I'll explain in
my own words:

As I had mentioned, JtR does reuse the hashes it computes for multiple
comparisons against possible multiple loaded hashes sharing a salt.
However, this does not require maintaining any optimized nor long-term
table of the computed hashes, except for having the set of just-computed
hashes in memory.  In the simplest case, you simply have one computed
hash at a time and you loop over all loaded hashes that have the same
salt.  So you only store one currently computed hash in memory.  In the
more complex case, which is what JtR typically has, multiple hashes are
computed in parallel, and the current batch of computed hashes for the
current salt (but different candidate passwords) is then mass-compared
against the loaded hashes sharing this salt.  This mass-comparison
operation is optimized in some ways, including through having
performance-efficient data structures (bitmaps and hash tables) built
out of loaded hashes (but not out of just-computed hashes).

> Perhaps I'm approaching it with non-specific terminology. After
> reading the FAQ and your explanations, let me try again.
>
> Hash: A cryptographic hash of a cleartext password (and potentially a
> salt) stored in a password file. Aka a password hash. The thing we're
> using JtR to crack. Therefore g/s is the number of target hashes
> successfully cracked per second.

OK so far.

As a special case, JtR may omit duplicate hashes on loading, in which
case the guess count and g/s reported while cracking will be limited to
those hashes it had loaded.  "john --show" may then show a larger number
of guesses (including duplicates that might have been omitted before).

As another special case, for a few especially weak hash types such as LM
and bigcrypt, JtR may split input file's hashes into halves at loading,
and then it counts the individual halves as hashes and for successful
guesses during cracking.  Again, "john --show" will correct for that in
its output of actual plaintext passwords, combining the half-guessed
plaintexts, but not in terms of its final line with the counts, which
will continue to count the halves individually.

Given the hash types you're working with, you might face the first
special case (as you appear to deliberately play with reused salts), but
probably not the second.

> Crypt: A cryptographic hash generated by JtR from a candidate password
> (and, if present in the password file, salt). If a crypt matches a
> hash, JtR records the associated candidate as a successful guess to
> the pot file.

This could have been right, except that historically "crypt" refers to
the Unix crypt(3) function, and thus to the process and effort of
computing a password hash - not to the computed hash value.  So it's OK
to say "crypts per second" meaning the computation made and effort
incurred per second (even though crypt(3) is not normally used in JtR,
and multiple hashes are normally computed in parallel).  But we don't
say "if a crypt matches ..." or the like.  We say "if a computed hash
matches ..."

> Combination: Candidate password I understand, but which "target hash"
> are we referring to - a password hash, correct? If so, then C/s is the
> speed with which JtR compares a candidate's hash to a target hash? And
> if subsets of target hashes share a salt, JtR will compare a
> candidate's hash to the entire subset of target hashes until it finds
> a match - allowing for C/s >= c/s.

That's correct.

> Therefore, I should not refer to c/s as crypts tested, but crypts
> generated, correct?

This could have been correct, but we say "hashes computed".

Alexander

P.S. I've now changed the sha1crypt "--test" benchmark to assume 20000
iterations.  This is in bleeding-jumbo.  This is to match hashcat's
default for its benchmark, which was presumably chosen to be similar to
iteration counts seen on those hashes on Juniper routers and switches.

There doesn't appear to be any commonly used default iteration count for
sha1crypt, except that
https://rchouinard.github.io/phpass/api/class-PHPassLib.Hash.SHA1Crypt.html
uses a default of 40000, but I decided that benchmark-compatibility with
hashcat is more important for us.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.