crypt-dev - Re: Intentionally Increasing Collisions in Password Hashing Algorithms

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAConRVE67f6C1jO2tjkVT330Ro0U-aiw-H7MW-LnHw3Wc-BgmQ@mail.gmail.com>
Date: Thu, 13 Dec 2012 12:42:56 -0800
From: Steven Alexander <pdp11hacker@...il.com>
To: crypt-dev@...ts.openwall.com
Subject: Re: Intentionally Increasing Collisions in Password
 Hashing Algorithms

Matt,

I think it's great that you were willing to put this out here knowing that
many people would quickly dismiss it as a crazy idea.

For users with strong passwords, this potentially makes their accounts more
vulnerable to online guessing on the weak site.  If a user has a long pass
phrase, an attacker is better off just guessing random passwords for that
account than to try to guess the actual passphrase.  Even so, this could
have little practical effect.  E.g., if you truncate the hash to 30 bits,
an attacker will still need half a billion guesses on average which isn't
practical for an online attack.

For users with weak passwords, I don't think truncation helps.  Assuming
again that you truncate the hash at 30 bits, an attacker will need half a
billion guesses on average to get in with a random password.  But, an
attacker isn't going to use random passwords.   He's going to guess
popular/probable passwords.  So, let's pretend the attacker tries 50,000
guesses against each of 1,000 different accounts for a total of 50 million
guesses.  The attacker has only about 10% chance of getting in with any of
these guesses by random chance (i.e. a collision that is not the real
password).  In reality, however, he might succeed in getting access to 200
accounts.  With a very high probability, every one of those 200 passwords
is a correct guess (not just a collision).

In Defuse's post, he says that the stored hash should be B-bits long where
B is less than the entropy of the password.  Assuming, for simplicity, that
each of the 200 compromised accounts in the previous scenario used 7
character alphanumeric passwords, their Shannon entropy is about 36 bits.
If we truncate the hash at 30 bits, this would seem to satisfy the
requirement that B is less than the entropy of the password.
Unfortunately, the guessing entropy is much less than the Shannon entropy.
In order to truncate the hash to less than the guessing entropy, we'd have
to cut it off at something less than 16 bits.  At 15-bits, an attacker
would need about 16k guesses (on average) to get into an account using
random guesses.  I think that's pretty unacceptable, even for a low
security site.

Using blacklisting would change this somewhat.  With a large blacklist, you
should be able to get a big increase in the guessing entropy of the
passwords.  Unfortunately, I think that blacklisting weakens the case for
truncation since it lessens the possibility of password reuse across sites.

Regards,

Steven

On Wed, Dec 12, 2012 at 3:03 PM, Matt Weir <cweir@...edu> wrote:

> So this is a continuation of a rather long Twitter thread between
> Solar, @DefuseSec and me, (apparently Twitter is not IRC), talking
> about increasing collisions in certain password hashing algorithms as
> a defensive measure. As a quick disclaimer, this very well might be a
> horrible idea, (I know when people first hear about it that’s their
> initial reaction). That being said, it is something that I feel is at
> least worth looking into. Also i'd like to point people to
> @DefuseSec's blogpost on the subject:
>
> https://defuse.ca/blog/2012/12/increasing-collisions-in-password-hashing/
>
> Quick Definitions:
> Offline Attack: The attacker has access to the password hash. They can
> make as many guesses as their resources allow them to. An example of
> this would be using John the Ripper/HashCat to crack BCrypt hashes.
> Online Attack: The attacker does not have access to the password hash.
> The defender can limit the number/rate of guesses the attacker can
> make. An example of this would be an attacker using THC Hydra to
> submit password guesses to a web login page.
>
> Background/Scope:
> Just to get this out of the way at the start. Increasing collisions is
> in fact a horrible idea if your goal is to defend against offline
> attacks. Under no circumstances should this be used for encryption
> applications.
>
> So taking a step back, there are certain fundamental truths when it
> comes to the current state of password security. Users will choose
> weak passwords; users will reuse passwords; and many sites will not
> implement memory hard or computationally hard password hashing
> algorithms. Now some users do choose strong passwords, many users
> avoid reusing passwords for important sites, and some sites do use
> strong hashing functions. But there’s enough examples of that not
> being the case I don’t think I have to spend too much time saying this
> is a major problem. If it wasn't a problem then tools like JtR and
> Hashcat would be useless and no-one would worry when LinkedIn’s
> password hashes were stolen ;p
>
> Now there’s certainly good tactics to combat this problem. Two-factor
> authentication, password vaults like Keepass, hashing algorithms like
> BCrypt and SCrypt, user training, etc. That being said, there’s
> reasons why we haven’t solved the password problem yet. Many of these
> techniques like password vaults are only slowly gaining user
> acceptance. On the other hand blaming users for picking weak passwords
> has gained high acceptance but that doesn't seem to be working very
> well ;p Strong hashing algorithms certainly can scale up to websites
> that support millions of users, (Twitter uses BCrypt). But going off
> Solar and Simon Marechal’s slides from the Password^12 conference when
> talking about phpass, “The portable hashes are very simple, which was
> key to phpass’ acceptance. More elaborate portable hashes would likely
> not be accepted; This may be something to try for a next generation
> phpass now that the foot is in the door”. My interpretation of that,
> (which is also based on my personal experiences), is there is going to
> be a lot of resistance to widely deploying computationally expensive
> hashing algorithms. Here's the link to their talk btw:
>
> http://www.openwall.com/presentations/Passwords12-The-Future-Of-Hashing/
>
> This brings us to password hash collisions. Ideally we could just
> focus on making password hashes as expensive as possible to compute
> while training users to choose strong/unique passwords to make
> cracking cost prohibitive. Since there is a lot of pushback on this,
> I've been looking at defensive techniques that might have an
> asymmetric cost for an attacker vs. the defender. A good example of
> such a technique is password salts. They have almost no impact on the
> defender, but have a huge impact on an attacker who is targeting
> multiple users.
>
> Now one issue is that users have passwords for many websites where
> their account has little value to an attacker, but the users then
> reuse their passwords for sites that do contain valuable info.
> Attackers exploit this by breaking into low value sites, stealing the
> password hashes, (hopefully they are hashed…), cracking them and then
> attempting to use the stolen credentials against high value sites.
> Basically they are taking advantage of the fact that offline attacks
> against weak hashes are fairly cheap while online attacks can be
> comparatively quite expensive, (you have to deal with rate limiting,
> IP blacklists, captchas, etc). Now online attacks are certainly
> popular as well, (just look into all the SSH bruteforcing going on),
> but targeting password reuse is popular for a reason. So it would be
> nice if we could raise the cost of checking if a cracked password is
> valid on a different website, (basically adding an “online” component
> to an “offline” cracking attack).
>
> That’s why I’m looking into intentionally increasing collisions in
> password hashes for low value websites/accounts. If we can insert some
> level of uncertainty into the hash cracking process, (does this guess
> really match the plaintext password the user selected?), it can raise
> the cost for an attacker when it comes to attacking other sites. For
> example, now attackers will try a username+password combo along with a
> few variations, (such as incrementing numbers at the end of the
> password), but if that doesn't work they assume the user didn't reuse
> their password and they move on. The best description I've seen of
> this based on real world data is:
>
> https://www.guildwars2.com/en/news/mike-obrien-on-account-security/
>
> With collisions, the attacker doesn't know if the password they
> cracked is the user’s real password. This means when they attempt to
> reuse it and fail the attacker has a couple of choices:
> 1)      Proceed as normal, (depending on the probability of a collision,
> the attacker’s first match still has a good chance of being the user’s
> password). The advantage for a defender is, A) If the attacker did
> crack the user’s password then it would have been cracked regardless
> B) If it is a collision the attacker will fail to access the user’s
> account even if the user reused their password.
> 2)      If the reused credentials don’t allow access, continue their
> cracking session until they find another match and then try the new
> credentials, (they can queue up possible passwords before checking
> them). Repeat until they find the correct password, or they run out of
> time, (they decide to stop). Not only is there the additional cost of
> checking collisions, but if a user did not reuse their password the
> attacker doesn't know that. Aka the attacker may waste time
> cracking/checking passwords long beyond when they would normally stop
> if there weren't collisions.
>
> One other advantage with password hash collisions is they provide
> deniability of the plaintext value. If someone’s password is
> embarrassing, such as “I hate my boss”, this provides them the ability
> to claim that their password was really something else if their
> password hash is disclosed.
>
> Now unlike salts, there are downsides to the defender for this
> approach. The first is that it increases a user’s vulnerability to
> online guessing attacks. AKA, an attacker’s guess might collide with
> the user’s password allowing the attacker to log in. That’s not a
> minor concern and I’ll come back to that later. The second problem is
> that for this approach to have any impact it will make it trivial for
> an attacker to find a collision when performing an offline attack.
> Basically if an attacker has access to the hash they also have access
> to that user’s account. That’s not something to discount and it was
> the subject of a lot of the Tweets between Solar and myself earlier ;p
>
> Before getting back to those trade-offs I’d like to describe some
> ideas about how to increase hash collisions in a what I hope is a
> controlled fashion.  For example, there are many different ways to
> increase collisions in popular password hashing functions. The key
> then is to select a method that does not leak additional info about
> the plaintext password while at the same time allowing the defender to
> estimate the number of collisions that are likely to occur.
>
> This rules out increasing collisions by mangling the plaintext
> password before hashing. An unintended example of this approach can be
> seen in old DES based Crypt(3) hashes which truncate user passwords
> after 8 characters. While this certainly increases collisions, (for
> example ‘password’ and ‘password123’ would share the same hash), it
> also allows an attacker to target only the first 8 characters of the
> password instead of the entire password. While there are some
> advantages to this approach, (the attacker doesn't know whether to try
> ‘password’ or ‘password123’ when reusing the credentials), those
> advantages are more than outweighed by the fact that an attacker can
> learn about the beginning portion of a password without having to
> guess the entire password.
>
> These requirements also disqualifies modifying the internal operation
> of popular hashing algorithms, or building new collision prone hashing
> algorithms from scratch. Beyond the fact that such an approach is a
> bad idea in general, such changes or new hashing algorithms may
> unintentionally leak information about the plaintext password. In
> addition, it is unlikely without a detailed study of the new
> algorithms that the number of collisions can be accurately predicted.
> This is a longer way of saying that while it is theoretically possible
> a new collision prone hashing algorithm could be developed, the
> chances of it going catastrophically wrong are not trivial and there
> is no reason to do so in the first place.
>
> Therefore the approach I’m currently investigating is to create
> collisions by truncating the results of existing password hashing
> algorithms. First of all this approach does not leak any additional
> information about the underlying password in an offline attack. I’m
> working on a formal proof to show this but the colloquial example is
> if this approach did leak additional info, then attackers would
> already be doing it. Note, I’m specifically referencing offline
> attacks. In an online attack it does leak data since it can let the
> attacker know about collisions if they can guess them, (and as a side
> effect log in which might be a bigger problem…). Still the knowledge
> gained about the underlying plaintext password learned based on the
> collisions would probably be limited if a standard/modern hashing
> algorithm was used and the password salt was large and remained
> unknown to the attacker.
>
> As far as being able to estimate the collision frequency goes, this is
> admittedly a trickier prospect. This is because it is dependent on the
> underlying hash function being used. The closer in behavior the
> underlying hash function is to a random oracle, then the easier it is
> to calculate the frequency of collisions. While no hash function
> exists that is a perfect random oracle, several are close enough in
> practice that intelligent guesses might be able to be made as to the
> probability that random collisions will occur, (yes I just suggested
> we make the cow spherical…)
>
> There are several other advantages to increasing collisions by
> truncating existing hashes. First, it should cause no additional
> performance impact compared to existing implementations. In fact it
> might trivially make authentication attempts quicker as only the first
> n bits need be compared between the hash from a user’s password
> submission and the saved hash on disk, (hopefully that speedup isn't
> noticeable or you need to be using a stronger hash!) Second, this
> should limit any additional side channel leaks compared to the
> existing hashing setup as the added work, (the truncation), is done on
> a fixed length hash for a fixed length value for every single password
> hash, regardless of the underlying plaintext. Finally, and perhaps
> most importantly, it is easy to upgrade to this from existing
> deployments. You simply take whatever hashing algorithm is being used
> and truncate it to the desired length. Since this is not dependent on
> the underlying plaintext password, site administrators don’t have to
> deal with two sets of user hashes, one for users who have logged in
> since the upgrade, and one for users who haven’t, (note if you want to
> avoid collisions with a blacklist of banned passwords it does require
> some additional work though).
>
> Now one consideration is you probably don’t want a user’s strong
> password to collide with passwords that are commonly used in online
> attacks. Aka if a user’s password collided with ‘123456’ that could be
> a serious problem. One solution for this when the user creates a new
> password, (or logs in with an existing password and the plaintext is
> available), is to check the password’s hash against the hash for those
> blacklisted passwords. If a collision occurs, change the salt, (You
> are using a salt right!?), and run a check with the new salt. Repeat
> until there are no collisions with blacklisted passwords. Now this is
> more complicated if you are upgrading your hashing scheme or
> blacklists, and don’t have access to the user’s plaintext passwords.
> There’s a couple of ways to deal with this, but one option might
> simply be to “hash the hash” and make that part of the new hashing
> algorithm going forward, (or switch back once the user logs in again).
> It also helps if the same blacklist is used to prevent users from
> actually picking those banned passwords.
>
> Oh, also a unique long and random salt is required. If different sites
> share the same collisions it kind of negates the advantages of this
> approach ;p
>
> So this leaves a lot of interesting questions. For example where
> should this be used and on what accounts? Aka you might not want to
> use this on webmaster accounts while using it for normal users. Also
> you may not want to use it for accounts that contain sensitive info as
> the assumption that if an attacker has the hash then they have access
> to the user’s account might not hold true. Then there’s the basic
> question of how likely should collisions be? Plus it's still left to
> be determined if the advantages of this approach outweigh all the
> negatives even for unimportant accounts.
>
> …. But I’m way past my 140 character limit right now and to be
> perfectly honest I don’t have good answers for these questions yet so
> I’ll leave answering them to other people and future e-mails ;p
>

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.