john-users - RE: Rules for realistic words

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BLU159-W51E89A5A815762A846AC77A4930@phx.gbl>
Date: Sat, 31 Dec 2011 17:57:38 +0000
From: Alex Sicamiotis <alekshs@...mail.com>
To: <john-users@...ts.openwall.com>
Subject: RE: Rules for realistic words


Interesting...I saw that the newer versions of John included such an option but I never tried it or googled it... Thanks.


> Date: Sat, 31 Dec 2011 10:24:12 -0600
> From: tansey@...utexas.edu
> To: john-users@...ts.openwall.com
> Subject: Re: [john-users] Rules for realistic words
> 
> Hi Alex,
> 
> Have you seen Markov mode?
> 
> http://openwall.info/wiki/john/markov
> 
> That seems to be more or less what you are describing in the first half of
> your email.
> 
> Wesley
> 
> 2011/12/31 Alex Sicamiotis <alekshs@...mail.com>
> 
> >
> > From an analysis I've conducted in a file containing greeklish (greek
> > words written in english) and english passwords, of ~1500 DES (max 8
> > length) passwords, the following came up:
> >
> > Very high frequency letters:
> > a=850 i=597 o=584 e=525
> >
> > Medium to low frequency letters
> > s=498 r=472 n=418 t=405 l=366 m=277 p=247 c=211 d=201 k=193 g=159 u=148
> > h=144 b=113 y=97 f=87
> >
> > Very low frequency letters
> > v=66 w=53 x=47 j=31 z=31 q=18
> >
> >
> > Number frequency:
> > 1=448 occurences
> > 2=293 occurences
> > 3=249 occurences
> > 9=219 occurences
> > 0=203 occurences
> > 4=185 occurences
> > 6=175 occurences
> > 5=174 occurences
> > 7=156 occurences
> > 8=132 occurences
> >
> > ...what this means, is that a new method of brute forcing could be used.
> >
> > Currently it's something like
> >
> > 1) single
> > 2) dictionary
> > 3) dictionary with rules
> > 4) incremental with digits, Alpha, Lanman, All from lower characters to
> > more characters.
> >
> > Now for the 26 letters of Alpha, it goes like 26x26x26x26x26x26x26x26 =
> > 208.8 billion combos
> > For the Alpha+Digits it goes 36x36x36x36x36x36x36x36 = 2.82 trillion combos
> >
> > What if there were intermediate character sets of frequently used letters
> > as an intermediate step between dictionaries with rules and incremental
> > with full character sets? For example the top 16 letters and 4 numbers = 20
> > characters in total. In such a case it's only 25.6 billion combos for 8
> > char length - and with multiple hashes, it's always worth to check these
> > first in order to crack them and speed up the rest. I think incremental
> > mode already applies some sort of "more frequent" type of cracking, but I
> > don't know how optimized it is in relation to this. If it already covers
> > this sector, ignore this comment.
> >
> > Another aspect that can take improvement, (not in cracking speed, but in
> > cracking the easier ones out) is to emulate how language is constructed.
> > For example greek & italian languages, use a lot of alternation between
> > consonant and vowels. This means that you can have a rule which goes like
> > this:
> >
> > (V)owel
> > (C)onsonant
> > (B)oth+numbers+symbols
> >
> > 1-4 lengths are cracked in incremental
> > From 4 char length onwards:
> >
> > VCVCV => italy
> > CVCVC => begar
> > VCVCB => nike@
> > CVCVB => epic6
> > VCVCVC
> > CVCVCV
> > VCVCVB
> > CVCVCB
> > VCVCVCV
> > CVCVCVC
> > VCVCVCB
> > CVCVCVB
> > VCVCVCVC
> > CVCVCVCV
> > VCVCVCVB
> > CVCVCVCB
> >
> > By splicing words in human-like syllables, I achieved a hefty increase in
> > effective cracking speed. Because instead of 26x26x26... it goes like
> > 18x8x18x8x18 - which means enormously less combinations than non-words like
> > zzxaeseq.
> >
> > (the following is a greeklish example - you may see some words as vowels
> > which are consonants in english, but in greeklish for example w is used
> > phonetically as o.. it's the omega letter)
> >
> >
> > [bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy]"
> >
> > [aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz]"
> >
> > [bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz]"
> >
> > [aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy][bcdfgjklmnpqrstvxz][aehiouwy]"
> >
> > In some cases it needs tweeking to account for two consonants or two
> > vowels in some part of the word (for example peNTagon, aCRopolis, bicyCLe,
> > AErodynamic), so a few variations of the above are necessary to cover a
> > large percentage of words.
> >
> > An analysis of the english language and linguistic patterns might give
> > significant increase in human-like words or composite words (that the
> > dictionaries do not contain - like name&surname). Ideally, we could have a
> > statistics program or an AI program to extract rules for the 95%+ of the
> > words contained in a certain language, so that combinations could be based
> > on this structure (with possible twists like adding stuff in the end).
> > English are a bit more difficult to do in a letter-by-letter format
> > compared to greek/italian, but, ultimately, it's just more variations. A
> > syllable approach (ie combos of one, two and three letter sequences) might
> > also be appropriate for english or other languages. For example instead of
> > combining words, we could combine ready syllables... The syllable MO +
> > syllable RE = word MORE. The combinations compared to 26^8 will drop
> > dramatically.
> >
> > Have a great 2012...
> >
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.