john-dev - Re: Upper casing (and lower casing) in john

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4E1F3EA5.9030709@bredband.net>
Date: Thu, 14 Jul 2011 21:08:21 +0200
From: magnum <rawsmooth@...dband.net>
To: john-dev@...ts.openwall.com
Subject: Re: Upper casing (and lower casing) in john

On 2011-07-14 17:18, JimF wrote:
> 1. rules: l u c C ?l ?u t TN (p P I may also be impacted). S V are also
> likely candidates.

Sometimes you really only want a-z (see 2. below) so for ANSI mode, I 
suggest we keep all the existing as-is and add alternate versions for 
some or all of them that use the new functions.

In UTF-8 mode, we could add support for (fully) case-shifting whole 
words but as soon as we try to say "third character" or some such, rules 
are not UTF-8 aware. I have some vague thoughts about how to add future 
UTF-8 awareness in rules (counting multibyte characters as one) but that 
is probably far away - and it will be much slower than today so it must 
be separated so it doesn't hit non-UTF8 mode.

> 2. Formats (but these are one by one issues which need to be addressed
> directly). Oracle/mssql have been handled. LM has not, but by my
> understanding, what we have done already is the 'correct' method.

LM is special because it does not use iso-8859-x but the "OEM codepage", 
often cp437. We could easily add a special, complete, uc() function for 
that but then again the hashes may come from a Greek or French or 
Russian Windows, using cp737, cp852, cp866 or something else instead of 
cp437, and then our uc() for cp437 would just mess things up.

By the way this also applies to iso-8859-1 that we are supporting now. 
Unix hashes may well be made from iso-8859-2 or something else, and your 
new uc() for ansi would just make a mess instead of caseshifting correctly.

To handle this, we could add support for a number of codepages but I 
think we can just leave LM and other formats that doesn't use either of 
these encodings as-is for now.

Anyway I think the new full case-shifting support for iso-8859-1 and 
Unicode is great. It was something that nagged me but I never looked 
into it close enough to realise how tiny the case-significant part of 
the Unicode space is.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.