john-dev - Re: Re: md5_gen, proposed functionality

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4D72633C.6030303@bredband.net>
Date: Sat, 05 Mar 2011 17:22:20 +0100
From: magnum <rawsmooth@...dband.net>
To: john-dev@...ts.openwall.com
Subject: Re: Re: md5_gen, proposed functionality

On 03/05/2011 02:59 PM, magnum wrote:
> ---8<---------8<---------8<------
>
> I have left out the UTF-8 discussion from this because it makes things
> much more complicated and I think we should address that later. But this
> "casting" conversion will ONLY work for ASCII and ISO-8859-1 wordlists.
> This is a current problem with NT hashes too. And it's much lower
> priority so let's leave that for now.

However, after you understood everything above that line, I'd like to 
add the following to keep in mind when designing the stuff, for future 
enhancements. This now goes further than md5_gen - all the below also 
applies to NT, mscash and the other formats that are made from unicode 
plaintext:

Ideally we should have two different unicode conversion functions.

One is the one already used in many formats (and what I suggest for 
md5_gen in my previous message), insert a null byte between each 
character. This is ideally done in set_key() and get_key() as it can be 
done with almost no performance hit. This is a fully valid conversion 
between ISO-8859-1 and UTF-16, but if you feed it with something else 
you will just end up with garbage.

The other one is a true UTF-8 -> UTF-16 conversion. This would be needed 
if we want to be able (at all) to crack hashes made from a UTF-16 
representation of characters not present in ISO-8859-1. An NT password 
consisting of just one Euro sign is currently uncrackable by John, 
*regardless* of "8 bit bruteforce" or whatever you try to feed it. You 
simply can't get around it with conversions of wordlists or rules.

There is simple code from (for example) Unicode Inc that is free to use. 
It's very lightweight but if we're attacking very fast hashes, it's 
still something like a 50% performance hit. But if we ever want to be 
able to crack such passwords we will need it sooner or later.

One solution could be an "--utf8" switch to John, telling it that the 
wordlists are encoded in UTF-8. This would only affect formats like NT 
and the proposed unicode function in md5gen (other formats should just 
ignore it except for the suggested new reject rule mentioned below). It 
would tell md5_gen that if we use that MD5GenBaseFunc__convert2unicode 
function, we should call the real utf8-to-utf16 code instead of doing 
the quick cast.
The same functionality could be added to NT and mscash formats. I have 
experimental versions of those formats with true UTF-8 support added, 
but hardcoded so you can't turn it off.

Some wordlist rules work pretty bad with multibyte characters so there 
could also be a new reject rule, perhaps "-u", simply meaning "reject 
rule if --utf8 option is used" or possibly (and better) "reject rule if 
--utf8 option is used UNLESS the candidate is 7-bit characters only".

There are completely different ways you can do all this, but I have 
given this a lot of thought and I think this gives the most bang for the 
bucks, as well as the least performance hits. The changes are very small 
compared to most other alternatives I can think of.

magnum
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.