john-users - Re: Rules for realistic words

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJocqxPpB8nO1w-ZJeQY0QNF7WiTS4KZD9xc7vf4wmjGCrH==w@mail.gmail.com>
Date: Sat, 31 Dec 2011 10:24:12 -0600
From: Wesley Tansey <tansey@...utexas.edu>
To: john-users@...ts.openwall.com
Subject: Re: Rules for realistic words

Hi Alex,

Have you seen Markov mode?

http://openwall.info/wiki/john/markov

That seems to be more or less what you are describing in the first half of
your email.

Wesley

2011/12/31 Alex Sicamiotis <alekshs@...mail.com>

>
> From an analysis I've conducted in a file containing greeklish (greek
> words written in english) and english passwords, of ~1500 DES (max 8
> length) passwords, the following came up:
>
> Very high frequency letters:
> a=850 i=597 o=584 e=525
>
> Medium to low frequency letters
> s=498 r=472 n=418 t=405 l=366 m=277 p=247 c=211 d=201 k=193 g=159 u=148
> h=144 b=113 y=97 f=87
>
> Very low frequency letters
> v=66 w=53 x=47 j=31 z=31 q=18
>
>
> Number frequency:
> 1=448 occurences
> 2=293 occurences
> 3=249 occurences
> 9=219 occurences
> 0=203 occurences
> 4=185 occurences
> 6=175 occurences
> 5=174 occurences
> 7=156 occurences
> 8=132 occurences
>
> ...what this means, is that a new method of brute forcing could be used.
>
> Currently it's something like
>
> 1) single
> 2) dictionary
> 3) dictionary with rules
> 4) incremental with digits, Alpha, Lanman, All from lower characters to
> more characters.
>
> Now for the 26 letters of Alpha, it goes like 26x26x26x26x26x26x26x26 =
> 208.8 billion combos
> For the Alpha+Digits it goes 36x36x36x36x36x36x36x36 = 2.82 trillion combos
>
> What if there were intermediate character sets of frequently used letters
> as an intermediate step between dictionaries with rules and incremental
> with full character sets? For example the top 16 letters and 4 numbers = 20
> characters in total. In such a case it's only 25.6 billion combos for 8
> char length - and with multiple hashes, it's always worth to check these
> first in order to crack them and speed up the rest. I think incremental
> mode already applies some sort of "more frequent" type of cracking, but I
> don't know how optimized it is in relation to this. If it already covers
> this sector, ignore this comment.
>
> Another aspect that can take improvement, (not in cracking speed, but in
> cracking the easier ones out) is to emulate how language is constructed.
> For example greek & italian languages, use a lot of alternation between
> consonant and vowels. This means that you can have a rule which goes like
> this:
>
> (V)owel
> (C)onsonant
> (B)oth+numbers+symbols
>
> 1-4 lengths are cracked in incremental
> From 4 char length onwards:
>
> VCVCV => italy
> CVCVC => begar
> VCVCB => nike@
> CVCVB => epic6
> VCVCVC
> CVCVCV
> VCVCVB
> CVCVCB
> VCVCVCV
> CVCVCVC
> VCVCVCB
> CVCVCVB
> VCVCVCVC
> CVCVCVCV
> VCVCVCVB
> CVCVCVCB
>
> By splicing words in human-like syllables, I achieved a hefty increase in
> effective cracking speed. Because instead of 26x26x26... it goes like
> 18x8x18x8x18 - which means enormously less combinations than non-words like
> zzxaeseq.
>
> (the following is a greeklish example - you may see some words as vowels
> which are consonants in english, but in greeklish for example w is used
> phonetically as o.. it's the omega letter)
>
>
> [bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy]"
>
> [aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz]"
>
> [bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz]"
>
> [aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy]"
>
> In some cases it needs tweeking to account for two consonants or two
> vowels in some part of the word (for example peNTagon, aCRopolis, bicyCLe,
> AErodynamic), so a few variations of the above are necessary to cover a
> large percentage of words.
>
> An analysis of the english language and linguistic patterns might give
> significant increase in human-like words or composite words (that the
> dictionaries do not contain - like name&surname). Ideally, we could have a
> statistics program or an AI program to extract rules for the 95%+ of the
> words contained in a certain language, so that combinations could be based
> on this structure (with possible twists like adding stuff in the end).
> English are a bit more difficult to do in a letter-by-letter format
> compared to greek/italian, but, ultimately, it's just more variations. A
> syllable approach (ie combos of one, two and three letter sequences) might
> also be appropriate for english or other languages. For example instead of
> combining words, we could combine ready syllables... The syllable MO +
> syllable RE = word MORE. The combinations compared to 26^8 will drop
> dramatically.
>
> Have a great 2012...
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.