john-users - Crowd-sourcing statistics and rules

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANWtx01WuOCYGPUqzJjLVKg2+_nMAfqr9Y4rWxW7t6bJ5NzoXg@mail.gmail.com>
Date: Sun, 15 Apr 2012 21:36:06 -0400
From: Rich Rumble <richrumble@...il.com>
To: john-users@...ts.openwall.com
Subject: Crowd-sourcing statistics and rules

With all this talk of pattern matching/finding, could it also be time
to look at updating JtR's rules and giving anonymous feed back on
rules? With the clients I audit, I don't see much variation...
password incrementing and using the company name or products in the
passwords are very evident. JtR has a lot of information about each
cracking session in the log file that could be useful, and we might
want to submit certain stats for the community. At a minimum we could
submit some pattern details perhaps that may tell others to look for
similar patterns or try this rule set...
I'd submit that appending digits 2-4 decimal places to various
dictionary words is a common pattern, 007 is common at the end, the
old JtR "leet" rules don't apply to most of my users... 4=A has been
replaced with @ = a, 1 = L and ! = i, $ = S as opposed to 5 = S.
Naturally 0 = o and 3 = e still. The company name and or it's major
brands or items are often part of the users passwords. The passwords I
don't crack are idioms or phrases (based on polling these users,
assuming they are being honest in their admission). Granted I don't do
many slow hashes, mostly windows lm/ntlm, raw-sha and raw-md5. Most
passwords are English in origin.

I think some work has been done on this previously, extracting data
from the log files, but I couldn't find it. I wonder if we could look
at our own (personal)statistics and or submit them somehow (anon), if
we could detect more generalizations. It would be interesting if we
could even do "regions", find commonalities from Geo-located data...
Someone once determined that in the US southern states refer to
Soda-pop/Cola/Soft-drinks are more often generalized as "Pepsi" and in
the Northern states as "Coke"... Would a pattern like "biblical based"
passwords popular in the middle-east and more Sci-fi passwords in the
West coast of the Usa. That's a bit of a tangent/stream of
consciousness...sorry :)

John/Jumbo could be patched, but I bet a script could be used just as
well to cut down the minutia, and create more succinct details:
0:00:01:19 - Rule #15: '-c )?a r l' accepted as ')?arl'       (cracked 353)
0:00:01:30 - Rule #16: '-: <* !?A l p' accepted as '<*!?Alp'      (cracked 8)
0:00:01:39 - Rule #17: '-c <* !?A c p' accepted as '<*!?Acp'      (cracked 31)
0:00:01:47 - Rule #18: '-c <* c Q d' accepted as '<*cQd'      (cracked 99)
0:00:01:56 - Rule #19: '-c >7 '7 /?u' accepted as '>7'7/?u'      (cracked 0)
0:00:01:56 - Rule #20: '>4 '4 l' accepted as '>4'4l'      (cracked 0)
0:00:02:06 - Rule #21: '-c <+ (?l c r' accepted as '<+(?lcr'      (cracked 9)
0:00:02:15 - Rule #22: '-c <+ )?l l Tm' accepted as '<+)?llTm'      (cracked 17)
....
0:09:30:07 - Trying length 7, fixed @1, character count 31 (cracked 446)
0:09:37:26 - Trying length 6, fixed @6, character count 47 (cracked 248)

I think finding out what rules are working for more me personally
could save some time for others as well, it could be interesting to
see if I re-run John on those same passes minus the most "successful"
rules and compare... perhaps for me using 0000-9999 get's me far more
passes than the rule that does 19xx and 20xx date/years and I don't
want the overlap. Perhaps the rule that came last in JtR gets some
additional passes that a similar rule that came earlier, but the last
rule overall gets more and perhaps in the same amount of time but the
wordlist doesn't have to be run again. There may be improvements each
of us can do to achieve our goals, be it rule selection, order and or
"latest trends" (like a for @). There are a lot of variables here,
some of the things I stated are moot if the wordlist is small or the
hash is very very fast, but I'd be curious to grab more stats not only
from the passwords themselves, but the whole session holds information
we might all benefit from, even if it's not going to get you 5x more
passwords, maybe it gets you the 10-20 really hard ones you've been
going after.
In closing my very long winded email: “Statistics are like a bikini.
What they reveal is suggestive, but what they conceal is vital.” I
think it will ultimately be up to us as auditors/administrators to
create rules and find the patterns, but I bet we could all get a big
nudge from some small automation/script to get us started.
-rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.