Date: Thu, 14 Jul 2011 15:30:45 -0500 From: "JFoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: Re: Upper casing (and lower casing) in john >From: "magnum" > On 2011-07-14 17:18, JimF wrote: >> 1. rules: l u c C ?l ?u t TN (p P I may also be impacted). S V are also >> likely candidates. > > Sometimes you really only want a-z (see 2. below) so for ANSI mode, I > suggest we keep all the existing as-is and add alternate versions for some > or all of them that use the new functions. I do see your point about CP other than 8859-1. We need to research this out a little more. The new functionality would add good things to john. We just need to add it in a way as to not do any bad. > In UTF-8 mode, we could add support for (fully) case-shifting whole words > but as soon as we try to say "third character" or some such, rules are not > UTF-8 aware. I have some vague thoughts about how to add future UTF-8 > awareness in rules (counting multibyte characters as one) but that is > probably far away - and it will be much slower than today so it must be > separated so it doesn't hit non-UTF8 mode. For UTF-8 mode, we should really step back. First off, using a multi-byte format like utf8, is very expensive. Especially in the rules section, where you often have to swap to anther format, to the 'work', then swap back in. I think for this, we should run a pre-process the rule if in -utf8 mode, and determine IF conversions are needed. If all we are doing is appending '123' to the tail of the word, then no conversion is needed. In that case, we simply handle the string, as though it was ANSI. However, if we determine that there is something which would require conversions (length, indexof, casing, etc, etc), then I would suggest we convert the word into UTC2 (UTF16), and KEEP it that way, and once the rule has completed, then convert back into utf8 for processing by the format. It would be 'nice' if we could have some rule that says to leave the word in the already converted UTF16 (vs converting back to utf8 to later be converted back into UTF16), before passing it into the format. However, that would likely take some modifiations to the format (possibly new function pointers, or different params to the existing functions). I am not looking at making changes right now. I am more looking at finding out WHAT parts of john deal with casing (or lengths, indexes, etc when dealing with dictionary input words), and just what can be done to improve the exiting word handling/modification which john does. That handling/manipulation is one of the CORE reasons why john is such a great tool. Often, john is not the fastest tool out there (in 'raw' speed), but often it is THE BEST, and cracking passwords, because the correct candidate can be presented sooner in the cracking session. So, if we can find the locations where we can make this tool better, and find good ways to exploit that, while not causing slowdowns for any existing workflow, then that what I would love to look into. Jim.
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux - Powered by OpenVZ