Date: Fri, 5 Feb 2010 04:50:20 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Replacement for all.chr based on "Rock You" Passwords. URL inside. Minga - Thank you for posting this! I was hoping someone would do it. On Wed, Feb 03, 2010 at 05:21:29PM -0600, Minga Minga wrote: > I dont exactly remember how/when all.chr was created, and I have no > idea the last time it was updated, ... It was last updated in December 2005, shortly before the JtR 1.7 release. Most of the input data was much older, though - mid-1990s. > Now, I have many opinions about the passwords from the RockYou list. > They are NOT representative of "real" passwords by trained users in > corporate environments. But they ARE representative of idiots on the > Internet. And I guess thats a good enough place to start, as any, for > the default behaviour of JtR. I propose the all.chr update because we > cannot continue to use and propagate a .CHR file that is so outdated > (assuming it is?). The .chr files included with JtR are old (and are based on data that is even older), but I am not convinced they're outdated. Has there been much of a change in users' choice of passwords in the last 10-15 years? I think the average password became a little bit stronger (only a little bit, unless a password policy is enforced), but I also think that the relative frequencies of characters (as well as digraphs and trigraphs) remained mostly the same. Perhaps the change in average password complexity will be reflected in the "cracking order" table in the "header" of a .chr file, but do you really spot this change with the RockYou passwords (which are likely biased towards weaker ones)? If you look at password.lst prior to my last update (e.g., take the revision from JtR 188.8.131.52), it matches RockYou's top 100 as published by Matt reasonably closely, despite of the almost 15 years difference. It missed a total of 15 passwords from RockYou's top 100. One of those was "rockyou" - likely not that common overall. 4 were in fact very common nowadays. The remaining 10 were somewhat common (not found on another recent top 250 list). So I'd say that we had an 85% to 96% coverage of common passwords with a mostly 15-year old list. Yes, that list was longer (slightly over 3,000 entries), but most of the passwords that were on RockYou's top 100 were also closer to the beginning of password.lst. You could want to read my verbose commit messages for the recent password.lst updates here: http://cvsweb.openwall.com/cgi/cvsweb.cgi/Owl/packages/john/john/run/password.lst Thus, I think that the primary advantage of the RockYou list is not that it is newer (although this is an advantage), but rather that it is larger and more complete. I mean that previously we had to work with hashes, of which only a certain percentage - the weaker ones - were cracked. This resulted in some bias towards weaker passwords. Also, passwords longer than 8 characters were almost non-existent, for several reasons: the traditional crypt(3)'s limitation (and most of the hashes were of this type), those passwords being stronger (so fewer of them were cracked and could be used as input for .chr files), and those passwords being less common (OK, I admit that there has been some change in the percentage of longer passwords - from negligible to just small). BTW, I wouldn't call someone using a weak password on a website an "idiot". This depends on the person's use of the account, as well as age, experience with computers, and perception of risk. This does not necessarily suggest a low IQ, although maybe some correlation exists. > Since the .chr created from the 'RockYou' list - can NOT be used > to re-create the exact list of passwords, it is not a disclosure of > personal information (up for debate). Therefore, I make the assumption > it is safe for use. > > So what KoreLogic did was, obtained the list, cleaned up the list, Can you please describe the cleanups you made to the list? Maybe post a script that you used? > obtained a unique list of passwords from the list (14,249,979 in total) BTW, the input data for the .chr files included with JtR contained multiple instances of common passwords. The difficulty was in avoiding duplicates that resulted from passwords set by a specific person or on a specific system, yet including those that were genuine common passwords. Producing the password.lst file involved a similar difficulty. The primary way to address this was to only include duplicates that were found in unrelated input sets (IIRC, I required presence in 3+ input sets for inclusion into the final password.lst), but to include the full number of them if so (that is, also include duplicates from within the same input set, with some exceptions). You could want to re-introduce some repeated passwords into your unique list - those that are also found on other unrelated lists - or you could just include all, maybe with some manual filtering of very common yet "spurious" ones (e.g., "rockyou" is questionable in this case). > and created a .CHR file based on this list. We are now publishing this > new .chr file for everyone to use. Thank you! I was hoping someone would dare to do that. I am going to include your files under john/contrib/ in the Openwall FTP archive. You will likely need to release some updates, though - considering my input above and/or changes to JtR itself. Speaking of the latter, I am going to re-work the "incremental" mode, for the better indeed. I already have some test revisions of charset.c and inc.c files that address one of the shortcomings of the current approach (namely, its inability to increase the number of character indices for each character position fully independently from the rest of the positions). Even if I maintain support for older .chr files for a while longer (with some backwards compatibility code), it'd be beneficial to take advantage of the new approach and implementation. Also, you could want to generate multiple .chr files, with different filters, like it is done for those included with JtR. Arguably, JtR itself could be enhanced to perform some filtering like this while cracking, and to do so in an efficient manner (skipping large chunks of would-be-filtered candidate passwords at once), but this is tricky to implement if the goal is to achieve the same effect that is currently achieved with separate files. Specifically, it is not very difficult to efficiently skip passwords not matching a certain reduced charset, but it is more difficult to also use character, digraph, and trigraph frequencies only based on passwords consisting _entirely_ of characters from that reduced set. The latter is only easy to achieve by pre-filtering when the .chr file is generated. > In the next few months, KoreLogic will be posting a large amount of > password-based research on our website. Mostly based around new > techniques, new rules, and automation of large jobs to be run across > multiple systems. KoreLogic will also be doing multiple presentations > about Security Cons this year presenting our tools/rules/research > in 2010 as well. Sounds good. On a related note, I am seriously considering actually dedicating some of my time to start implementing built-in support for parallel processing. Commercial demand for this could make a difference. > Here is the CHR file, and the README associated with it including > instructions for use, etc. If we don't want to replace all.chr - > instructions are included for using rockyou.chr separately. > > http://www.korelogic.com/tools.html#jtr I am not replacing the included all.chr with this yet, but I am willing to consider doing something like that a bit later. I've mentioned some reasons why not yet above. Meanwhile, I'd be very interested in test runs of JtR with rockyou.chr vs. all.chr against some recent but unrelated password hash files. I'd appreciate it if you and/or others run such tests and post in here. Also, it'd be interesting to see the effect of (not) including repeated passwords in the input set for .chr file generation. Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.