john-dev - "valid character" class

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4E41134F.1040103@bredband.net>
Date: Tue, 09 Aug 2011 13:00:31 +0200
From: magnum <rawsmooth@...dband.net>
To: john-dev@...ts.openwall.com
Subject: "valid character" class

On 2011-08-05 01:41, Solar Designer wrote:
> On Fri, Aug 05, 2011 at 01:32:41AM +0200, magnum wrote:
>> What is the ?z (any character) class used for? Is it used anywhere, by
>> anyone? It's current meaning is indeed *any* character, valid or not.
> [...]
>> Maybe it was meant for PP stuff, much like the ':'.
>
> Exactly.  It's a no-op produced by some preprocessor expressions for
> some of the expanded rules.  I have a to-do item to have JtR optimize
> out such no-ops just like it does for ':' lately.

OK, I think we'll go for ?y for 'valid' then.

Question to *all*: There are some characters that are truly invalid for 
a codepage, like 0x98 in cp1251. There are also characters that are not 
really invalid per the Unicode spec, but control characters. For 
example, in most (all?) ISO-8859-xx codepages, the characters 
0x80..0x9F. Should we treat the latter as invalid? There are pros and 
cons. My personal vote is that we should treat them as invalid, i.e. the 
rule !?Y would drop any candidate that contains 0x80..0x9F if we're 
using --enc=iso-8859-1 but only 0x98 if using -enc=cp1251.

One effect of doing so is ability to reject/accept any UTF-8 encoded 
words (from a mixed wordlist like RockYou.txt) using such rules because 
*all* non-ascii characters in UTF-8 contains octets in that range. Of 
course, this could also be achived with another new, UTF-8 specific, 
character class.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.