john-dev - Re: Password Generation on GPU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BLU0-SMTP1014775E30E7EE81AE64074FD280@phx.gbl>
Date: Mon, 30 Apr 2012 10:57:36 +0200
From: Frank Dittrich <frank_dittrich@...mail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Password Generation on GPU

Hi all,

I'm afraid that part of this might be on topic in john-users.
But I hesitate to cross-post (because I am not sure which followup-to
would be better).

On 04/30/2012 06:32 AM, Solar Designer wrote:
> 4. We may try to make some other cracking modes set_mask()-aware as well.
> 
> Incremental mode is potentially capable of using this interface for the
> last character position, except when it has the last character index
> fixed (and thus alters character indices in other positions only).  For
> example, when trying passwords of length 8, incremental mode would thus
> be able to use set_mask() 87.5% of the time in a long-running session.
> (The growing and reducing c/s rate may be confusing, though.)

You could even use it for the last character as well, if you add 0x00 to
the mask.
This would have to be done for the last position only. Preferably, you'd
even start with the shorter password, then compute all the others.
I am sure how much of a problem the mixed length is for GPU, but you'd
have length switches in in incremental mode anyway.
Of course, this would work best with a chr file which has been generated
with passwords from john.pot without the last character.
May be we would need an alternative --make-charset-[gpu|mask|whatever]
switch.
The new "masked" incremental mode should work on CPU as well, even if
the primary purpose is make better use of GPUs with fast hashes.

I would't even specify the 0x00 as part of the mask (because then you
couldn't use strings anymore.
Otherwise, you'd have to use 2 additional parameters (mask array and
array length).
But if the "masked" incremental mode gets implemented this way, it
probably wouldn't make sense not to include 0x00 into the mask array.

> Wordlist mode with rules may potentially be able to use set_mask() for
> ruleset lines containing portions like Az"[190][0-9]".  However, that
> would be bad in two ways: it would confuse the rule preprocessor with
> the actual rule processor (making these things even more difficult to
> explain than they're now) and it would swap the words vs. rules
> processing order for the affected ruleset lines

If you have a file with word sorted by priority/popularity, this might
even be desired.
This file could either be the default password.lst, or the facebook list
of first names, sorted by frequency.
In this case, you could just use a couple of rules (say: append a
special character), not yet knowing which of the rules will work best,
but you know that names from the top 1000 lines are much more likely to
be used than names further down this list.
Once you figured out which rules work best, you interrupt your session,
and start individual sessions with those rules that worked best on the
most frequently used words.

As long as a reversed order of processing rules vs. words is not
implemented, a workaround is to split the word list file into several
parts, e.g., lines 1-1000, lines 1001-50000, lines 50001-last.
Then, run your cracking session on the top 1000 words, continue with a
restricted set of rules (based on knowledge gained with the top 1000
word list), use even less rules on the remaining part.
A disadvantage is that you'd need to split the file.
If you want to use a variable number of words in your first test run,
you'd even end up with multiple copies of the same wordlist.
E.g., for slow hashes and complex rules, you might want to start with
the top 100 words instead of the top 1000.

A good alternative to (optionally) reversing the order of rules and
words to be processed could be to provide a way to specify a range of
input words, say --from=1 --to=1000.
(For --from, 1 should be the default, the default for --to should be the
number of input words in the file.

I think this option is very useful, even if most users will ignore them.
(The --from and --to values should correspond to line numbers in the
file, so that due to skipped comments, you might end up with fewer words
being used.
The other alternative, --from=1 means starting from line 12 of
password.lst, will probably be even more confusing.)

> Instead, we may consider introducing the ability for rules to produce
> multiple candidate passwords.  Right now, each rule (as output by the
> preprocessor, when applicable) produces at most one candidate password
> for one input word.  There are good reasons (not limited to GPU
> acceleration) to allow for rules to produce multiple candidate
> passwords.

Yes, this would work as well.
But how does the user know when a rule will produce multiple candidate
passwords, and when there are just several rules which produce just one
candidate password (or even no password, if the rule or the word will be
skipped)?
Won't this be confusing for the users?
Do you want to introduce new rule names here (or are you running out of
characters for new rules)?

> So we may add some syntax that would do just that - e.g.,
> reuse curly braces for the purpose

OK, the curly braces for this new usage can easily be distinguished from
the { and } rules (shift).
But there could be some instances of '{' meaning the character '{' where
rules that were correct in the past might become incorrect, or where
rules change their behavior. (Fortunately, '{' and '}' are not used that
frequently in their meaning as plain characters, so this shouldn't be
that much of a problem.)

If we allow a rule to generate multiple passwords, we should also think
about enhancing --external filters in a similar way.


Frank
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.