john-users - Re: how bitcoin2john works

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20201017211118.GA10877@openwall.com>
Date: Sat, 17 Oct 2020 23:11:19 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: how bitcoin2john works

Hi Jack,

On Sat, Oct 10, 2020 at 03:45:06PM +0000, Jack Anderson wrote:
> I want to know how the output hash is reached by file analysis and calculation.

How the wallet's password protection works:

The wallet is a Berkeley database containing, among other things, the
master key.  When you have a password set, the master key is encrypted
with a symmetric key derived from your password.  In order to slow down
password cracking, the symmetric key derivation method is deliberately
computationally expensive, as specified with parameters also stored in
the wallet.  In order to prevent amortizing the cost of attacks on
multiple wallets at once (have the cost proportional to the number of
wallets, rather than only to the number of candidate passwords tested)
and to prevent precomputation (require that the attack can only be
started once the attacker has a copy of the wallet), a random input
(so-called salt) to the key derivation method is used and is also stored
in the wallet.

How bitcoin2john works:

bitcoin2john uses the Berkeley database library, accessing it via a
Python module, to parse the wallet.dat file.  It extracts the fields
mentioned above (encrypted master key, salt, key derivation method and
its parameters), makes sure they appear to be supported by our code, and
if so encodes the needed information into the "hash" it outputs.

Specifically, the "hash" output by current bitcoin2john includes the
last two AES blocks of the encrypted master key, the salt, and the
iteration count to the only currently supported key derivation method
(one based on a large number of uses of SHA-512).

Older versions of bitcoin2john included more data in the output "hash"
(the full encrypted master key and two more fields), but we've since
enhanced bitcoin2john not to include that extra data in order to make
the output "hashes" less security-sensitive (enough for cracking the
password, but not for much else).

How password cracking works given the "hash":

For each candidate password, we derive the corresponding symmetric key.
We then use that symmetric key to decrypt the last AES block of the
master key.  We can do that without having to decrypt prior blocks (nor
to have more than two) due to how CBC mode works.  We then check the
PKCS#7 padding on the decrypted block.  Due to how supported master key
sizes map onto whole numbers of AES blocks, the last AES block contains
either 16 (is padding-only) or 8 bytes of padding.  Either way, correct
padding is an almost sure indicator that the password was correct.

In order to utilize modern hardware's parallel processing capabilities,
we buffer candidate passwords and test them (as above) in large batches.

Terminology issue:

You might have noticed that I put "hash" in quotes.  That's because this
isn't actually a hash, and we crack its password differently from how we
crack passwords against hashes.  In this case, we attempt decryption and
check whether the decrypted data is correct-looking.  For actual hashes,
we attempt hashing and check whether the computed hashes match the hash
being cracked.

With JtR supporting so many non-hashes like this lately, maybe we need
to adjust our wording to no longer refer to JtR inputs as hashes.  Maybe
call them targets, e.g. as in "Loaded X targets with Y different salts"?

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.