passwords - Re: Submitting Partial Password Hashes to Pwned Password Lookup

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ9ii1E-X8Q764MP=JVVuRS4d_qkLb+4ZhbsBnKUcuE+w9XM6w@mail.gmail.com>
Date: Wed, 14 Mar 2018 16:40:53 -0400
From: Matt Weir <cweir@...edu>
To: passwords@...ts.openwall.com
Subject: Re: Submitting Partial Password Hashes to Pwned Password Lookup

Jim,

    Thanks for your reply. I think we all appreciate an author of a
standard explaining their intent! I started a side-tangent reply to
the blacklist requirement in NIST’s 800-63B but that might warrant a
different e-mail thread.

Breaking your reply down, there are multiple aspects to the Pwned
Passwords list that it sounds like there are open questions for:
1)      Risk if potential adversary gains access to the partial hashes
2)      Ways to minimize the risk of an adversary linking partial hash
request to users/sites
3)      Understanding value of the API to the defender, (for example
password strength checks, blacklists, etc)

I know several researchers who are considering #1. Saying there’s risk
inherent in an attacker learning the first five characters of your
password hash shouldn’t be too contentious. The question is, how much
risk is it for human memorable passwords? Side note, if the password
was randomly generated, is there value in using the service?

For #2, I recommend reaching out to Troy about your comments! I
realize security is a cost factor, aka the cost of re-engineering the
API and potentially breaking existing implementations certainly isn’t
0. At the same time, this API itself is a security feature so we
probably should be reducing risks associated with using it. Worst
case, he can say no.

For #3, I’ll admit I’m a bit blasé about the user frustration impact
of huge blacklists. That’s not because I don’t care about users,
(being one myself), but because as long as it doesn’t become an
explicit requirement I have trust in most sites siding with usability.
That’s a more roundabout way of saying I’d love to see more research
done in this area so the value of different blacklist strategies are
better understood. All the research I’ve seen has shown that
blacklists have a noticeable impact when protecting users against
online password guessing attacks, but I’ll admit my blacklist creation
advice is based as much, (if not more), on gut feelings vs actual
studies and experiments.

Matt

On Mon, Mar 12, 2018 at 10:10 PM, Royce Williams <royce@...hsolvency.com> wrote:
> Agree 100% with Jim - and I believe that the "commonly" concept is indeed
> being lost in context.
>
> Royce
>
> On Mon, Mar 12, 2018 at 2:01 PM, Jim Fenton <fenton@...epopcorn.net> wrote:
>>
>> On 3/12/18 1:19 PM, Matt Weir wrote:
>>
>> This e-mail has already grown too large as it is, but I’d be
>> interested in other people’s thoughts on this subject. Am I
>> misunderstanding the use of K-anonymity? How should we look at the
>> security of this approach?
>>
>>
>> Hi Matt,
>>
>> I have been looking for the appropriate venue to write something about
>> Pwned Passwords, so thanks for the prompt.
>>
>> tl;dr: Whether you call this k-anonymity or not, I have concerns.
>>
>> The concern with this, of course, is that some attack (perhaps an attack
>> on the web server or CDN) might give an attacker access to the queries and
>> associated IP addresses. With the IP addresses, it might be possible to
>> determine who the user (and their userID is). With a full hash, it would be
>> possible to crack that hash to find the password.
>>
>> But these hashes also come with frequency statistics. So the attacker can
>> just start with the most likely hash (the one with 47205 occurrences in your
>> example) and work down the frequency list from there. There are 475 hashes
>> representing 53006 password instances in the page you cited, so it's likely
>> that you'd only need to crack that one hash value and that's the right one.
>> Your analogy with past use of Shannon entropy is indeed accurate.
>>
>> Another concern I have is with the API itself. Many web servers by default
>> log the URLs of their requests, so with this API it might log the IP address
>> of the requester and the prefix of the hash. I'm sure Troy has this all
>> covered but it sounds like he has an elaborate CDN to handle the flood of
>> requests he gets for this, and I hope they have this covered as well. A
>> better approach would be to use a POST request and put the hash prefix in
>> the body of the request (although I don't know what effect that might have
>> on the CDN).
>>
>> My other concern has to do with the effects of using such a large
>> blacklist. This has the potential to frustrate users: "Every password I try
>> is already taken!" when in fact if the password only appears only a few
>> times in such a large corpus it's probably pretty good. When users get
>> frustrated like this, they tend to do predicable things, akin to appending !
>> to a password to meet composition rules. A good reference on this:
>>
>> Habib, Hana, Jessica Colnago, William Melicher, Blase Ur, Sean Segreti,
>> Lujo Bauer, Nicolas Christin, and Lorrie Cranor. “Password Creation in the
>> Presence of Blacklists,” 2017.
>> https://www.ece.cmu.edu/~lbauer/papers/2017/usec2017-blacklists.pdf.
>>
>> If a pattern emerges, attackers don't have far to go from the passwords in
>> this corpus to find the right answer. When the offline attack gets the
>> attacker close enough to enable an online attack, we have a problem.
>>
>> I played with this a bit for PasswordsCon LV 2016 using Mark Burnett's
>> corpus of 10 million breached passwords, and formed the opinion that a list
>> of about 100,000 passwords (representing those appearing 3 or more times in
>> the corpus) was reasonable, both from a security and lack-of-frustration
>> perspective:
>>
>> https://www.slideshare.net/jim_fenton/toward-better-password-requirements
>> (see slides 17-22)
>>
>> I haven't seen whether Troy has published any frequency statistics for
>> this set. If not, I should ask him (or just do it myself).
>>
>> I'm a little concerned that people are interpreting the following wording
>> in NIST 800-63B as a requirement that the list has to be as comprehensive as
>> possible:
>>
>> When processing requests to establish and change memorized secrets,
>> verifiers SHALL compare the prospective secrets against a list that contains
>> values known to be commonly-used, expected, or compromised.
>>
>> I wrote that sentence, and that wasn't my intent (the key word here is
>> "commonly"). There's a balance, and making it harder to choose an acceptable
>> password isn't always good for security.
>>
>> -Jim
>>  (personal opinions, not NIST's, of course)
>
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.