john-users - Re: john improvement suggestions

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20051221023242.GA17029@openwall.com>
Date: Wed, 21 Dec 2005 05:32:42 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re:  john improvement suggestions

Hi Radim,

First of all, thank you for your suggestions - and thank you for posting
them in here rather than sending them to me privately. :-)

On Mon, Dec 19, 2005 at 09:48:30PM +0000, Radim Horak wrote:
> 1. Bugs and annoyances
> - I have passwords (Traditional DES) from some old linux box, that are longer 
> than 16 chars, ie. consist of 3 hashes (crypt24?). John ignores such passwords 
> completely. I have tested them by manually cutting them. The 3rd hash uses salt 
> from the beginning of 2nd hash as 2nd hash uses the salt from beginning of the 
> 1st hash. I cannot provide the hashes nor I have access to that old linux box.

Yes, this should be implemented, but I wanted to see some samples from
commercial Unices first:

	http://article.gmane.org/gmane.comp.security.openwall.john.user/165

> - Is it possible to disable log

Currently, no (although you can symlink it to /dev/null).

This is a reasonable request.  I've added it to the TODO for
consideration for post-1.7.

> to avoid very large log files (with certain rules)?

If you have a lot of rules (perhaps due to the rule preprocessor), you
would run into another problem first: John tries to validate the syntax
of all rules (after preprocessor expansion) at startup.  If some of your
preprocessor commands would expand into millions of rules, John startup
would be slow.  Maybe this syntax check should be made optional, or
maybe it should be limited to the first million of rules.

> 2. Compilation
> - I have tested MSVC compiler with john.  
> (freely avail. here: http://msdn.microsoft.com/visualc/vctoolkit2003/)
> I compiled the encryption routines with these parameters:
> cl /c /TC /G7 /Ox /O2 LM_fmt.c
> and the rest (I guess more POSIX dependent files) under the cygwin. The result 
> was notable improvement in Lanman DES cracking speed (around 19% increase), on 
> AthlonXP cpu (comparing to gcc -O3 -march=athlon-xp). Other algorithms were 
> performing at the same speed.

I really don't think that the performance improvement you've observed is
related to this compiler producing better code (I do not know whether
that is the case).  LM_fmt.c is not really performance critical.  For LM
hashes on x86/MMX, the performance critical code is primarily in DES_bs.c
and indeed in x86-mmx.S.

One possible guess is that the apparent improvement is due to the changed
code and data placement in L1 instruction/trace and L1 data caches.
That's pure luck.  The code could have become "slower" just as easily.

John does make an attempt to reduce this kind of uncertainty by keeping
all the frequently accessed data structures together (struct DES_bs_all),
and 19% sounds excessive for x86 (it would not surprise me on Alpha with
direct-mapped caches - I've seen much worse things happen there).

Here's another guess: maybe you were not building for MMX at all?  If
your benchmarks are for pure C code and you were recompiling DES_bs* as
well, then I'm not surprised.  There can be this kind of a difference
between C compilers, especially on x86.  But the important thing is that
it is irrelevant since anyone who cares about performance would be
making use of the MMX code.

Can you provide the specific "john --test" outputs for both builds and
tell us your CPU clock rate (real and P4 rating)?

Overall, I do not see a need to support building John with MSVC.  That
would complicate the code with more #ifdef's for no good reason.

> - At this point, I'd like to see some way to increase the benchmarking time to 
> get more coherent/stable/exact figures.

As explained above, it's not only the short benchmark time which is
responsible for the differences.  There will be differences between
different builds and between different invocations of John.  Of the
supported hash types, LM hashes will usually be impacted the most since
they are very quick resulting in frequent branches between pieces of
code which come from different object files.  (Yes, a possible
optimization would be to combine pieces of code from inc.c, cracker.c,
LM_fmt.c, DES_bs.c, and DES_bs_b.c or x86-mmx.S in one source file.
But it would make the program hard to maintain.)

Anyway, the benchmarking time can be adjusted: it's BENCHMARK_TIME in
params.h.  You may also want to patch the __CYGWIN32__ checks out of
bench.c, forcing it to display virtual (processor) time c/s rates on
Windows.  (Normally, this is disabled since it does not work right on
Windows 9x.)

> 3. Optimizations
> - In bigcrypt and other splitted psw. hashes, the first part(s) of the password 
> has known length of 8 (7 in LM). When cracking such passwords, the password 
> candidates (either from dictionary or incremental mode) that are shorter should 
> be filtered out (skipped).

Yes, I had this idea, too, and it's been suggested a few times by
others.  Unfortunately, I do not see a way to implement it without
introducing much code complexity and/or performance overhead.

For the saltless LM hashes there's almost no extra cost in trying
shorter candidate passwords against all of the loaded hashes if there
remains at least one hash for which that is needed.  So let's
concentrate on bigcrypt:

For each given salt, there may be hashes both of known plaintext
lengths (non-last parts of split encodings) and of unknown ones
(non-split or last parts of split encodings).

There will have to be per-salt flags indicating whether there are still
any uncracked unknown-plaintext-length hashes for that salt.  There will
need to be quite some extra code to maintain those flags (it won't be
performance critical, though).  There will also need to be per-hash
flags indicating whether the plaintext length for this hash is known.

Then, for salts with only known-plaintext-length hashes, some candidate
passwords will have to be skipped.  However, those same candidate
passwords will need to be tried against other salts.  So the number of
candidate passwords to try will be different for different salts.

Unfortunately, this is not in line with the way John does things
currently and with algorithms such as bitslice DES (which assume a fixed
number of keys per invocation).  To overcome this difficulty, John would
need to maintain two separate sets of buffered keys (doubling the number
of buffered keys - while L1 data cache size remains constant!) and
invoke the low-level crypto routines for the appropriate sets of salts
whenever either buffer fills up.

Does this subtle optimization for the rarely-used bigcrypt justify the
added complexity and the likely slight reduction in c/s rate?  I am
really not sure.  I'd rather spend my time on more obvious improvements.

(A similar approach - also maintaining two sets of buffered keys - could
help to bitslice AFS passwords hashing achieving some great speedup.  If
those hashes were very common, I would do that.  But they are not.)

> - Optimize regexp expansion order [a-z] according to some general/usage 
> statistics.

That would cause much confusion.  As Frank has suggested, you can always
do such optimizations on your own.

> - When there are psw. hashes cracked in session, does john skip cracking them 
> immediatelly?

Yes, it won't be wasting any time trying to crack whatever it has
already cracked.

> Or is there a session reload needed?

No, except when you have multiple instances of John share the same
john.pot (which is supported).

> 4. Other improvements
> - I use adhoc "wordlist" rules a lot, could there be some kind of this 
> functionality: -rules:wordlist_set_1 for selecting wordlist rules??

Yes, it's already on TODO for post-1.7.  I'm not sure about the exact
syntax and semantics that I'd implement yet.

> - Is it possible to execute mmx and non-mmx instructions in x86 CPUs in 
> parallel? If so would it be possible get advantage of it in john?

Yes - for some instruction pairs and, consequently, for some hash types.
On some CPUs, ALU+SSE, MMX+SSE, and even ALU+MMX+SSE could be beneficial,
too - and people have been doing that.  There's definitely room for
improvement here.

However, please note that the current MMX code in John does use several
of the regular non-MMX registers (for array subscripts, etc.) - so there
are not many registers available to do other calculations.  Also, the
non-MMX instruction set and timings are less suitable for bitslice DES.

> - The cygwin version of john can't access files (ie. wordlists) larger then 2 
> GB, would you consider a more win32 native/friendly version of john? (mingw?)

This has nothing to do with Cygwin or Windows.

John does not implement LFS, so it won't support files larger than 2 GB
on 32-bit systems (but will happily work with larger files when built
natively on 64-bit systems).

I _might_ implement LFS eventually, if it would still make sense (that
is, if people would be seriously using John on 32-bit systems by the time).

> - I think that the complexity gap between wordlist mode (even with rules) and 
> incremental modes is too big, there is a "undiscovered country" of various 
> optimizations/complexity reductions etc. Ie. like generating only "pronouncable" 
> passwords

I disagree.

There used to be an optional built-in filter for word-like passwords in
ancient versions of John, but when I improved "incremental" mode for
John 1.5+, I dropped that for good.

John effectively stores statistical information on trigraphs (and on
arbitrary trigrams) in .chr files.  I don't think you can be generally
smarter than that when programming a filter manually.  Besides, you
might want to be trying the less likely candidate passwords later on.

If you _really_ want to filter out non-pronounceable passwords, you can
do that with an external filter() that you would specify in john.conf,
then invoke John like:

	./john -i=alpha -e=filter_pr0n passwd

For a better c/s rate (but a highly non-optimal order of tries), you
could define an entire external mode that would produce only
pronounceable passwords.

But I really do not recommend that you do that, except in very few
special cases, -- e.g., enforcing a particular password policy or
exploiting the properties of a password generator:

> (http://www.multicians.org/thvv/gpw.html)

Now, it could make sense to match a very particular password generator
like that with a John external mode.  The "Strip" cracker currently in
the default john.conf (of 1.6.40) does something similar.  The same
could be done to search the keyspaces used by the various pronounceable
password generators (those keyspaces are often not large enough).

Additionally, many of those password generators do not produce a uniform
distribution.  So a combination of a .chr file generated from lots of
actual passwords produced by the particular generator and an external
filter() which would make John obey the rules of that generator might
work best.

> or the recent addition of alpha-numeric charset.

Actually, in most cases all.chr should be used.  I believe most of the
uses of the new alnum.chr would be because of a lack of understanding of
how "incremental" mode works.  But the demand for this was just too
high so I got tired of explaining why this was not needed except in very
few cases.  (I've been reminded that one of those cases is testing for
compliance to password policies that require alphanumeric plus one shift
character.)

> Another good thing to do would be to re-feed the cracked passwords

Yes, this is on TODO for post-1.7 - but no promises (there are way too
many items on TODO).

> with the single-mode rules.

As Frank has pointed out, some of the "single crack" mode rules may only
be used on word pairs.

> 5. Time-memory trade-off for Traditional DES
> - I know. I know that there are salts that make it less effective, but I think 
> it's still worth (I assume 4096-times more memory needed for the equivalent of 
> non-salt performance, or 1024-times more memory for 1/16th of non-salt 
> performance. 
> Ophcrack (http://lasecwww.epfl.ch/~oechslin/projects/ophcrack/) uses 701 MB of 
> tables for alphanumeric LM passwords which is comparable to smallalphanumeric 
> Traditional Unix DES and 701 GB is quite acceptable storage space today, much 
> more tomorrow. Even pure smallalphabetical table would be beneficial and much 
> less demanding.)

Yes, this could be done, but I think that would be a separate program.
I could roll it into John the Ripper for "marketing" reasons, though. ;-)

(No, I am not planning to work on that in the near future.  But I do not
rule out the possibility either.)

> 6. There was mentioned a non-public sse-optimized version of john in this forum. 
>  Is there a chance we (public) will ever get our hands on it?

Yes, definitely.

> If so, when?

I don't really know.  This is not planned for the 1.7 release, but it
will likely appear in 1.8, or maybe in a "1.7 Pro".

The current code is suboptimal and there's no proper SSE detection and
fallback to MMX (rather, the program would simply segfault if the CPU
and/or OS would lack SSE support).  It would also run _slower_ than MMX
on all CPUs except for Intel Pentium 4 (and its flavors, including P4
Celerons and P4 Xeons), so maybe those would need to be detected, too.
Yes, I realize that P4s are indeed widespread.

Thanks again,

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments

Was I helpful?  Please give your feedback here: http://rate.affero.net/solar
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.