john-dev - Re: Mask mode (was Password Generation on GPU)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120808060356.GC22926@openwall.com>
Date: Wed, 8 Aug 2012 10:03:56 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Mask mode (was Password Generation on GPU)

myrice -

I thought you'd try multiple sequential bitmaps like what Bit Weasil
does in Multiforcer first.

But I don't mind you working on both tasks in parallel.

On Wed, Aug 08, 2012 at 03:10:47AM +0800, myrice wrote:
> I am changing hard coded password generation to mask mode. If I got it
> correct, hashcat use mask mode generated password based on mask only.

I've never used it, but I think that historically mask mode was a
separate cracking mode invoked on its own, but later hashcat gained the
ability to chain cracking modes together (or does this only work for
multiple wordlist rulesets? I don't know).  If we have a hashcat user in
here, perhaps he can enlighten us.  Also, I think the mask syntax
originated in PasswordsPro, which pre-dates hashcat.  But I could be
wrong.  Anyhow, both *hashcat* and PasswordsPro are worth looking at if
we want to avoid unnecessary syntax incompatibilities.

> From previous discuss, JtR could apply mask on exist keys that passed
> from set_keys() interface.

Please don't confuse JtR as a whole and the formats interface.  Yes, I
suggested that set_mask(), which is to be added to the formats
interface, will apply the mask to keys previously set with set_key().
Both are part of the formats interface.  This does not imply that JtR
as a whole would let the user combine masks with other cracking modes.
It may, or it may not, independently of how it's implemented in the
formats interface and whether a given format even provides set_mask().

The standalone mask mode (to be invoked on its own, not in combination
with any other mode) would have code to generate candidate passwords on
its own, without reliance on a format's set_mask().  It would also make
use of set_mask() when available, for some of the character positions
(like 2).  It shouldn't do that for too many because then we'd be
spending too much time per crypt_all() call, which would make the
program non-interactive, prevent frequent enough updates of the .rec
file, and cause "ASIC hangs" on GPUs.

I suggest that initially we only implement this standalone mask mode,
not supporting combinations with other cracking modes.  (In a sense
this will be inferior to your current hack.  That's a pity.)

Allowing for the use of masks along with other cracking modes is an
enhancement to add later.  Again, this should not depend on set_mask()
being available (it should also work for formats that don't provide it),
but set_mask() should be made use of (for some character positions, not
necessarily for the entire mask) when available.

I think that formats should provide the number of character positions
for which they can apply masks on their own - maybe a min-max range
(e.g., we may have params.min_mask_positions and
params.max_mask_positions).  Of course, it will always be allowed to
avoid set_mask() altogether - for other cracking modes - so the min will
only apply in case set_mask() is actually used.  To provide an example,
if you determine that iterating over two characters is optimal in a
given format, it can report "2" for both min and max, or if you want to
provide better performance with small charsets (such as digits only),
you may report min=2, max=3 (and indeed support both 2 and 3 in your
code then).  Mask mode's code in JtR itself must adapt to that.

> I provided void set_mask(int count, int *positions, char* masks). For
> example, we could use set_mask(2, [2,4], ['d', 'l']). It will replace
> position 2 with digits and 4 with lower case letters. 'd' indicates
> digits and 'l' indicates letters which borrow from hashcat and
> correspond to wordlist rules in JtR.

I think these shortcuts for character sets should be at high level only,
not in the formats interface.  For example, if a user specifies
?l?l?l?l?d?l on the command-line, this string is passed on to JtR's mask
mode implementation (on CPU), which turns it into strings "abc[...]xyz"
and "0123456789" (or maybe with characters sorted for decreasing frequency)
and then e.g. passes the strings (character lists) for positions 4 and 5
into set_mask(), and iterates over them on its own for positions 0 - 3.

On the other hand, if significant speedup is expected for hard-coded
character lists (e.g., if you'd increment the ASCII code rather than
read the next character from an array), then we may consider an
interface that would accommodate that as well.  But we do need the
flexible interface supporting arbitrary character lists anyway, because
the user might as well specify arbitrary characters rather than use one
of the shortcuts.  BTW, this is a reason why the mask mode
implementation might choose to use set_mask() on other than the last few
character positions: there might be too few different characters in
those (but more in other positions).

> But this may be overlap. If we have a wordlist file contains
> "password[1-9]" and we set mask to set_mask(1, [9], ['d']). In
> crypt_all, we will have "password[1-9]" multiple times. Or we just
> append to the keys not replace the exist character in the wordlist?

As I explained above, with the initial implementation of mask mode this
issue won't arise.

When we later allow for combining of masks with other cracking modes, I
think we should be appending the masks.  In terms of the set_mask()
interface, I think appending may be requested e.g. by specifying the
positions as -1.

> Also, what if some words don't have such positions. Using set_mask(2,
> [2,4], ['d', 'l']) as example, if a word is 'am', it do not have
> position 4 and am_'l'( _ is null) do not seems like a meaningful word.
> I prefer to discard these invalid positions.

The high-level code should ensure that this issue does not occur inside
a format's set_mask().

> Another way of using mask is use mask alone (or it is real mask
> mode?).

Exactly.

> We do not use set_keys() and only use set_masks().

No, we'll use both.

> For
> example, we could use set_mask(4, [1,2,3,4], ['d', 'd', 'd', 'd']) to
> produce 0000-9999 on GPU. But if the mask is too large, such as
> aaaaaaaa-zzzzzzzz, I am afraid of long GPU run which cause ASIC hang.

This is one of the reasons to use both functions.

> I am test mask mode only in raw-md5-opencl and have not produce
> set_mask() interface into john. I manually set_mask in reset(). I want
> to make these clear before I add interface into it: overlap, only
> append to word and invalid positions.

Initially, set_mask() needs to support overstrike only, and it should
accept strings with character lists for each position.

On the other hand, if it is easier for you to implement appending now
(such as because you don't want to implement the standalone mask mode on
your own yet), please do that.

Thanks,

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.