Date: Sat, 05 Mar 2011 17:22:20 +0100 From: magnum <rawsmooth@...dband.net> To: john-dev@...ts.openwall.com Subject: Re: Re: md5_gen, proposed functionality On 03/05/2011 02:59 PM, magnum wrote: > ---8<---------8<---------8<------ > > I have left out the UTF-8 discussion from this because it makes things > much more complicated and I think we should address that later. But this > "casting" conversion will ONLY work for ASCII and ISO-8859-1 wordlists. > This is a current problem with NT hashes too. And it's much lower > priority so let's leave that for now. However, after you understood everything above that line, I'd like to add the following to keep in mind when designing the stuff, for future enhancements. This now goes further than md5_gen - all the below also applies to NT, mscash and the other formats that are made from unicode plaintext: Ideally we should have two different unicode conversion functions. One is the one already used in many formats (and what I suggest for md5_gen in my previous message), insert a null byte between each character. This is ideally done in set_key() and get_key() as it can be done with almost no performance hit. This is a fully valid conversion between ISO-8859-1 and UTF-16, but if you feed it with something else you will just end up with garbage. The other one is a true UTF-8 -> UTF-16 conversion. This would be needed if we want to be able (at all) to crack hashes made from a UTF-16 representation of characters not present in ISO-8859-1. An NT password consisting of just one Euro sign is currently uncrackable by John, *regardless* of "8 bit bruteforce" or whatever you try to feed it. You simply can't get around it with conversions of wordlists or rules. There is simple code from (for example) Unicode Inc that is free to use. It's very lightweight but if we're attacking very fast hashes, it's still something like a 50% performance hit. But if we ever want to be able to crack such passwords we will need it sooner or later. One solution could be an "--utf8" switch to John, telling it that the wordlists are encoded in UTF-8. This would only affect formats like NT and the proposed unicode function in md5gen (other formats should just ignore it except for the suggested new reject rule mentioned below). It would tell md5_gen that if we use that MD5GenBaseFunc__convert2unicode function, we should call the real utf8-to-utf16 code instead of doing the quick cast. The same functionality could be added to NT and mscash formats. I have experimental versions of those formats with true UTF-8 support added, but hardcoded so you can't turn it off. Some wordlist rules work pretty bad with multibyte characters so there could also be a new reject rule, perhaps "-u", simply meaning "reject rule if --utf8 option is used" or possibly (and better) "reject rule if --utf8 option is used UNLESS the candidate is 7-bit characters only". There are completely different ways you can do all this, but I have given this a lot of thought and I think this gives the most bang for the bucks, as well as the least performance hits. The changes are very small compared to most other alternatives I can think of. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.