john-users - Re: Incremental attack properties questions

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <BLU0-SMTP314CB12F5E0CB39173AB326FD260@phx.gbl>
Date: Sun, 6 Jan 2013 11:10:03 +0100
From: Frank Dittrich <frank_dittrich@...mail.com>
To: john-users@...ts.openwall.com
Subject: Re: Incremental attack properties questions

On 01/06/2013 04:06 AM, magnum wrote:
> On 5 Jan, 2013, at 13:00 , Frank Dittrich <frank_dittrich@...mail.com> wrote:
>> Even if you would get incremental mode working with non-ascii
>> characters, the incremental mode would sooner or later generate byte
>> sequences which are not valid utf-8 characters.
>> (This shouldn't happen with Markov mode, provided you generate your
>> custom stats file with valid input. There's just one exception if a byte
>> sequence for a non-ascii character at the end of the word gets cut off
>> due to maximum length or maximum Markov level limits.)
> 
> I really had no idea Markov is this good with UTF-8. This is cool stuff.

As long as you don't have any characters which require more than 2 bytes
for UTF-8 encoding, Markov works really good, except for cutting off
byte sequences composing a single character at the end of the word.
If you add 3-byte characters into the mix, things get worse, because
then you have sequences of continuation bytes in the range 0x80-0xbf.
As long as there are not too many 3-byte or 4-byte characters in your
input, the number of invalid UFT-8 words generated will not be too bad.

(Once you finish the UTF-8 validity check for --markov mode used
together with --encoding=utf-8, --markov mode will be an almost perfect
fit for UTF-8 passwords.)

Frank

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.