john-dev - Re: Collecting same-length passwords

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120401202745.GD8484@openwall.com>
Date: Mon, 2 Apr 2012 00:27:45 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Collecting same-length passwords

On Fri, Mar 30, 2012 at 12:37:50AM +0200, magnum wrote:
> Just thinking loud here:
> 
> Some formats benefit a lot from having same-length candidates in a
> batch. RAR is an example (having one-length candidates would let us use
> SSE2, or it would boost OpenCL and OMP), as is tripcode iirc.

For tripcodes, we're grouping by two characters, not by length.

For AFS, we could group by length (short vs. long) to allow for
bitslicing, but we don't do that yet (to be done, although there's
almost no demand for AFS lately).

> Could we introduce an optional "mode" of operation where the return of
> saved_key() indicates whether we are "ready" for a crypt_all() call or
> not? As long as set_key() returns false, we just keep calling it with
> more candidates.

We could.

> If this is implemented I could have a buffer for each
> length. Whenever a buffer get full, set_key() returns true and this
> triggers crypt_all.

Yes, but unfortunately crypt_all() will have to process all buffers, not
just the one that got full.  So this will be only a minor improvement
over what we have now unless we make other changes to the interface as
well - e.g., allow for crypt_all() to process a subset of the keys only -
but this gets tricky.

Besides, for .rec file updates we need some checkpoints where we know
that all candidate passwords generated by some point are already fully
processed.  If we only see e.g. just one candidate password that falls
in a certain buffer, but we also process lots of other candidate
passwords, then we'll have to choose between processing this
almost-empty buffer (which is inefficient) and not reaching a checkpoint
for a long time (so the .rec file can't get updated).

For tripcodes, this is currently "solved" simply by buffering a lot of
candidate passwords, yet keeping that number sane so that the program
remains somewhat interactive and we do have regular checkpoints.

> Maybe for saltless formats this could be implemented in current API...
> What if I announce a dummy crypt_all() and always call the real
> crypt_all() from within set_key() when appropriate?

You may.  However, you need to declare some max_keys_per_crypt anyway,
and your crypt_all() shouldn't be entirely dummy (it should be there to
process less-than-full buffers when max_keys_per_crypt is reached - and
that way you'll have checkpoints too).

Does this have any advantage over simply buffering stuff and doing
everything in crypt_all(), like we currently do for tripcodes?  Well,
maybe the cache hit rate will be higher (buffered candidates will be
hashed sooner).  So, yes, we may try it for something fast like
tripcodes.  RAR is too slow for this effect to be measurable.

> Even simpler, I could just use a max_keys_per_crypt that guarantees that
> at least one buffer is full.

Yes, I tried that for tripcodes, but after testing I ended up specifying
a smaller value for now.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.