Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 30 Apr 2012 08:32:48 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Password Generation on GPU

myrice, Samuele, magnum, all -

On Tue, Apr 17, 2012 at 03:19:18PM +0800, myrice wrote:
> For term of "password generation", I just want to make it clear. Do you
> mean that in real password generation, the single, wordlist and incremental
> mode? I mentioned real because we split benchmark with real password
> generation. In benchmark, we just use bench_set_keys(set key from test_fmt
> one by one).

Yes, at least for "fast hashes", we need to implement real (not just
benchmark-specific) candidate password generation on GPU.  However, it
doesn't necessarily have to be the same as the current JtR cracking
modes - it just needs to be something usable during a real attack on
real hashes.

While in the long run we could want to have JtR's usual cracking modes
on GPU, I think we could start with the following:

1. Enhance the formats interface by adding an optional method like:

void set_mask(int count, int *positions, char **charsets)

which would tell the format to substitute all possible characters from
the supplied charsets in the supplied character positions.  For example,
if called with (1, [5], ["abcdef"]), it would for every key set with
set_key() additionally substitute every one of the six characters in
character position number 5 (zero-based).  So crypt_all() would produce
6 times more password hashes than we had candidate passwords provided
via set_key().  cmp_all() would operate on all of the actually computed
hashes.  In order for the caller to know how many times to call
cmp_one(), cmp_exact(), and get_hash*() - when these are needed -
crypt_all() would be changed to return the number of hashes actually
computed.  (I am considering certain other changes as well, which may
result in these interfaces being revised some further or differently,
but I am not including those other changes here to avoid confusion.)

2. Add a "mask mode" to JtR.  (This is desirable anyway.  Other crackers
have it, but JtR makes this unnecessarily difficult for users by only
providing things like the KnownForce external mode instead.)  In this
mode, the user will specify a mask consisting of per character position
charsets (or just charset identifiers such as "lowercase letter").

Indeed, this cracking mode will make use of the set_mask() feature above
(when the current format provides it) for some of the character
positions (iterating over the rest on its own such that the program
doesn't stay inside crypt_all() forever and run out of memory for the
computed hashes, but is responsive to any-key, prints guessed
passwords, updated the pot/log/rec files, etc.)

3. Enhance --test to run an extra benchmark and output an extra line for
formats that offer set_mask().  For salted fast hashes, we'll need to be
outputting three lines: many salts (mask or not should not matter when
the number of salts is large enough), one salt and mask (simulate
mask-using cracking modes), one salt and no mask (simulate other
cracking modes).  For saltless fast hashes, it'd be two lines (no "many
salts", obviously).

4. We may try to make some other cracking modes set_mask()-aware as well.

Incremental mode is potentially capable of using this interface for the
last character position, except when it has the last character index
fixed (and thus alters character indices in other positions only).  For
example, when trying passwords of length 8, incremental mode would thus
be able to use set_mask() 87.5% of the time in a long-running session.
(The growing and reducing c/s rate may be confusing, though.)

Wordlist mode with rules may potentially be able to use set_mask() for
ruleset lines containing portions like Az"[190][0-9]".  However, that
would be bad in two ways: it would confuse the rule preprocessor with
the actual rule processor (making these things even more difficult to
explain than they're now) and it would swap the words vs. rules
processing order for the affected ruleset lines (different behavior for
different formats).  So we shouldn't implement this in that exact way.

Instead, we may consider introducing the ability for rules to produce
multiple candidate passwords.  Right now, each rule (as output by the
preprocessor, when applicable) produces at most one candidate password
for one input word.  There are good reasons (not limited to GPU
acceleration) to allow for rules to produce multiple candidate
passwords.  So we may add some syntax that would do just that - e.g.,
reuse curly braces for the purpose - and then have it use set_mask()
when available (tricky to do given the current separation between
rules.c, wordlist.c, and cracker.c - so we'll need to revise that, even
though it may make the source code structure more complicated).

BTW, the set_mask() thing is good not only for GPUs, but also for
OpenMP-parallelized (or otherwise parallelized) formats running on CPU
and achieving high c/s rates.  Specifically, I'd use it for LM hashes,
which currently don't scale beyond the performance of approximately 1.5
threads on CPU.  (Currently, we have to use other approaches to
parallelize these - such as jumbo's MPI support or running multiple
separate instances of John - where each separate process generates its
own stream of candidate passwords.)

set_mask() is also good in that it may be closely coupled with the
format's crypto code when that is helpful.  For example, for LM hashes
the already-transformed key bits matrix may be modified.  So
theoretically it might provide greater speed than what we'd achieve by
having each thread generate its entirely independent stream of candidate
passwords.

As usual, I'd appreciate any comments.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ