passwords - Don't Scratch Your Entropy

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <bb947076-61f4-b613-0337-aa39cba0316c@bestmx.net>
Date: Sat, 9 Jul 2016 18:00:46 +0200
From: "e@...tmx.net" <e@...tmx.net>
To: passwords@...ts.openwall.com
Subject: Don't Scratch Your Entropy

I have a strong conviction that 99% of "security experts" do not know 
the definition of the entropy. This conviction does certainly seem 
wildly deranged for you, unless you know the definition in question. So, 
let's begin with the definition, by the book.

H = sum(p_i * log(p_i))

This is a function of the probability vector P = {..., p_i, ...} that 
represents a distribution of a random variable. Entropy is a 
characteristic of a distribution of a random variable. No more and no less.

Let us find the entropy of your password. Your password's distribution 
vector is {1}, therefore your password's entropy is:

H = 1 * log(1) = 0

Your password's entropy is ZERO. Try log(1) in different bases on 
different computers if you are unsure.

A sophisticated reader may ask: "What if we apply entropy to the 
password creation procedure?" It is doable in seemingly reasonable way. 
We can model any password creation procedure as a random choice from a 
pool of candidate passwords, then characterize the password distribution 
over this pool with the entropy. The resulting number will tell us how 
much information our procedure represents. So what? Is this number of 
any use in the context of "password security"?

Security experts usually jump in here and claim that this number 
represents the strength of the produced password. For the argument sake, 
let's accept this claim, and construct a password creation procedure as 
follows:
the password pool is {"123", "password", 
"gtfr3467ujhbvcddgy6r5ddsefvvs", "###"},
we toss two coins and pick one from this four according to the coin toss 
outcome.

The entropy of this procedure is (given the coin toss produces uniformly 
distributed outcomes):
H1 = -(1/4) * log(1/4) * 4 = 2

Now (according to the mainstream computer "science" (dictated by the 
NIST recommendations)) we must label all our passwords with this entropy 
value:
"123" has the entropy based strength 2
"password" has the entropy based strength 2
"gtfr3467ujhbvcddgy6r5ddsefvvs" has the entropy based strength 2
"###" has the entropy based strength 2.

Looks somewhat counter intuitive, and not at all what you used to think 
about the "entropy" as being pronounced by a respectable "expert" with a 
straight face.

Furthermore, we can define another password creation procedure:
toss one coin and pick from the pool 
{"123","gtfr3467ujhbvcddgy6r5ddsefvvs"}.
The entropy of this procedure is (twice less than the previous): 1. 
Therefore:
the password "123" has the entropy based strength 1.

The very same password "123" that also has the strength 2. A password 
has two different strengths simultaneously. If we understand the 
"strength" as a likelihood of being guessed by the attacker, then a 
single password can not have two different values, because the password 
alone is the input argument for the hypothetical attack, not the 
password creation procedure.

Thus, accepting the premise: "the password creation entropy 
characterizes a produced password", we end up with a contradiction. 
Entropy is demonstrated to be not a function of a password. However, in 
a little less mentally insane world I should have skipped this lengthy 
demonstration altogether. The entropy is just defined as a function of a 
random distribution -- who would have thought that it is also NOT a 
function of anything else!

But I am not a champion of taking the longer route to obvious 
conclusions. Matt Weir have conducted a meticulous experiment with 
leaked passwords to make the statement: "entropy based password strength 
measures do not provide any actionable information to the defender", and 
also: "there is no way to convert the notion of Shannon entropy into the 
guessing entropy of password creation policies". In other words, he gave 
us an experimental evidence that the entropy is irrelevant to the 
password strength problem. Of course, it is irrelevant! This irrelevance 
is plainly written in the entropy definition. Matt, you could have just 
read the definition and say: "corollary, dear 'experts', don't scratch 
your entropy". Nevertheless, these experimental results are of a great 
value for humanity, and I am glad we have them, the more evidence the 
better. In this world of imbeciles, even the most obvious facts require 
tons of "proofs", so far as the "experts" does not go along with math 
logic very well.

Still there is more to the topic! Not only the entropy of an accurate 
password creation model is irrelevant to the problem of password 
strength, but also the model itself is not possible in real life 
usecases. What distribution are you going to apply to human created 
passwords? Given that (a) humans are incapable of randomization (b) the 
pool of passwords they choose from is not accessible to us, not even by 
vivisection of the brain. This fact makes the entropy even worse than 
irrelevant, it makes the entropy ARBITRARY -- whatever distribution we 
assume for a human created password it is inevitably baseless arbitrary 
garbage.

Let's recap:

The entropy is a function of a distribution of a random value.

Corollary:

(a) your password's entropy is 0

(b) every "security expert" pronouncing "entropy", without defining the 
distribution or at very least the pool of candidate passwords, is a 
brain dead buffoon.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.