Date: Sun, 2 May 2021 23:21:34 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: source of information for John's charset files On Sun, May 02, 2021 at 10:29:29AM -0800, Royce Williams wrote: > On Sun, May 2, 2021 at 9:50 AM Solar Designer <solar@...nwall.com> wrote: > > > (I had heard folks cracked almost the entire HIBP set by downloading and > > testing against it various lists of breached passwords. After all, HIBP > > is supposed to only contain passwords that were breached or leaked in > > plaintext, so if Troy could compile this collection then others could as > > well. However, for my test above I only used what was crackable without > > usage of plaintext leaks beyond RockYou.) > > Just to make sure that everyone's aware, it wasn't just a matter of > acquiring the component breaches. Many other techniques were needed to > fully "recover" the plains for the HIBP hashes as published. Many of them > are not "real-world" passwords - they're full of nested hashes, conversion > errors, HTML escapes, truncations, untrimmed separators, and many other > non-password artifacts. And even after reverse-engineering those, some > remain. Just something to keep in mind when measuring cracking success > rates against that corpus, or trying to use that corpus as a wordlist for > other attacks. Thank you, Royce. > For more detail, CynoSure Prime and m33x and I did some work on the first > couple of HIBP releases, and wrote up the results here: > > https://blog.cynosureprime.com/2017/08/320-million-hashes-exposed.html > > Hard to believe it was four years ago. :) This appears to show that at least the nested hashes are a small minority of the total. Also, being of unusually high lengths for passwords they wouldn't affect incremental mode much since its statistics are mostly per-length. And they're easy to exclude. A few other things you list are really bad for this use, indeed, but again it matters how common they are in that corpus. Anyway, I just ran some tests the other way around - "cracking" RockYou passwords. I didn't try excluding RockYou itself from the training sets here - can't do that while including our current .chr files in the comparison. So this is in-sample testing, which is generally a wrong thing to do, but with that in mind here are the results for different training sets (all are for incremental mode and 1 billion candidates): RockYou with dupes - 20.2% RockYou unique - 21.9% HIBPv7 cracked - 17.9% The percentages cracked are those of RockYou unique. Not surprisingly, RockYou is best fit for itself. HIBP is an acceptable fit as well. It could have potentially performed better than RockYou on this test due to its larger size, but as we can see that was not enough to overcome it not being such a perfect fit as RockYou itself. So this is inconclusive. Royce (and others), please feel free to try generating .chr files from RockYou vs. HIBP too, and run them on real-world test sets you have and share your results here. Thanks! Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.