passwords - Re: Submitting Partial Password Hashes to Pwned Password Lookup

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <443898ec-0583-1870-4725-d8dcdbb53abc@bluepopcorn.net>
Date: Mon, 12 Mar 2018 15:01:24 -0700
From: Jim Fenton <fenton@...epopcorn.net>
To: passwords@...ts.openwall.com
Subject: Re: Submitting Partial Password Hashes to Pwned Password
 Lookup

On 3/12/18 1:19 PM, Matt Weir wrote:
> This e-mail has already grown too large as it is, but I’d be
> interested in other people’s thoughts on this subject. Am I
> misunderstanding the use of K-anonymity? How should we look at the
> security of this approach?

Hi Matt,

I have been looking for the appropriate venue to write something about
Pwned Passwords, so thanks for the prompt.

tl;dr: Whether you call this k-anonymity or not, I have concerns.

The concern with this, of course, is that some attack (perhaps an attack
on the web server or CDN) might give an attacker access to the queries
and associated IP addresses. With the IP addresses, it might be possible
to determine who the user (and their userID is). With a full hash, it
would be possible to crack that hash to find the password.

But these hashes also come with frequency statistics. So the attacker
can just start with the most likely hash (the one with 47205 occurrences
in your example) and work down the frequency list from there. There are
475 hashes representing 53006 password instances in the page you cited,
so it's likely that you'd only need to crack that one hash value and
that's the right one.  Your analogy with past use of Shannon entropy is
indeed accurate.

Another concern I have is with the API itself. Many web servers by
default log the URLs of their requests, so with this API it might log
the IP address of the requester and the prefix of the hash. I'm sure
Troy has this all covered but it sounds like he has an elaborate CDN to
handle the flood of requests he gets for this, and I hope they have this
covered as well. A better approach would be to use a POST request and
put the hash prefix in the body of the request (although I don't know
what effect that might have on the CDN).

My other concern has to do with the effects of using such a large
blacklist. This has the potential to frustrate users: "Every password I
try is already taken!" when in fact if the password only appears only a
few times in such a large corpus it's probably pretty good. When users
get frustrated like this, they tend to do predicable things, akin to
appending ! to a password to meet composition rules. A good reference on
this:

Habib, Hana, Jessica Colnago, William Melicher, Blase Ur, Sean Segreti,
Lujo Bauer, Nicolas Christin, and Lorrie Cranor. “Password Creation in
the Presence of Blacklists,” 2017.
https://www.ece.cmu.edu/~lbauer/papers/2017/usec2017-blacklists.pdf
<https://www.ece.cmu.edu/%7Elbauer/papers/2017/usec2017-blacklists.pdf>.

If a pattern emerges, attackers don't have far to go from the passwords
in this corpus to find the right answer. When the offline attack gets
the attacker close enough to enable an online attack, we have a problem.

I played with this a bit for PasswordsCon LV 2016 using Mark Burnett's
corpus of 10 million breached passwords, and formed the opinion that a
list of about 100,000 passwords (representing those appearing 3 or more
times in the corpus) was reasonable, both from a security and
lack-of-frustration perspective:

https://www.slideshare.net/jim_fenton/toward-better-password-requirements 
(see slides 17-22)

I haven't seen whether Troy has published any frequency statistics for
this set. If not, I should ask him (or just do it myself).

I'm a little concerned that people are interpreting the following
wording in NIST 800-63B as a requirement that the list has to be as
comprehensive as possible:

> When processing requests to establish and change memorized secrets,
> verifiers SHALL compare the prospective secrets against a list that
> contains values known to be commonly-used, expected, or compromised.
I wrote that sentence, and that wasn't my intent (the key word here is
"commonly"). There's a balance, and making it harder to choose an
acceptable password isn't always good for security.

-Jim
 (personal opinions, not NIST's, of course)

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.