Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Thu, 23 Jun 2011 02:53:03 +0400
From: Solar Designer <>
Subject: John the Ripper 1.7.8: DES speedup


Earlier today, I've released John the Ripper 1.7.8, and I've just made
available the updated -omp-des patches for it as well.  (A jumbo patch
update is to be announced separately.)

This release has been sponsored by Rapid7, a leading provider of unified
vulnerability management and penetration testing solutions:

As a few of you might be aware, Roman Rusakov and I have been working on
new DES S-box expressions and program code, with Rapid7's sponsorship.
The primary idea was Roman's, and he did all the work to generate the
S-box expressions (which took months on his overclocked water-cooled
quad-core machine with 24 GB RAM).  My humble contribution was code
generation and feedback to Roman such that we'd have not only the
smallest gate count, but also decent program code (not requiring too
many registers, reasonably efficient on 2-operand architectures, yet
containing inherent parallelism).  In the end, we had thousands of
same-gate-count "circuits" to choose from for some of the S-boxes and
some of the target instruction sets.

Well, as you have guessed by now, John the Ripper 1.7.8 replaces the
S-box expressions with Roman's, and the corresponding code with mine
(where applicable).  Being mathematical formulas, the S-box expressions
are not copyrighted and are free for reuse by anyone.  The corresponding
program code I have placed under a cut-down BSD license.  It is our
intent to encourage reuse of both the S-box expressions and their
corresponding program code, including in "competing" password security
auditing programs.

Speaking of gate counts, the new S-box expressions offer a 17%
improvement over the corresponding previous best results (which we've
been using in John the Ripper so far).  Specifically, for the
instruction set of typical x86 CPUs (MMX, SSE2, AVX), Matthew Kwan's
S-box expressions (generated in 1998) required an average of 53.375
gates per S-box (XNOR gates had to be substituted with pairs of other
gates).  Roman's S-box expressions need only 44.125 gates per S-box.
Similarly, for CPUs/GPUs with "bit select" instructions (Cell, PowerPC
with AltiVec, AMD XOP, high-end ATI GPUs), the previous best result by
Dango-Chu was 39.875 gates.  This is now improved to 32.875.

Looking at it another way, the S-box expressions used to be 21% larger.
This is not just a marketing figure, it is actually relevant: if the
program code consisted solely of the S-boxes, this (and not the smaller
17% figure) would be the potential speedup.

In practice, though, a 12% to 14% speedup at DES-based crypt(3) hashes
is typical.  Here's the new benchmark on Core i7-2600K 3.4 GHz under
Ubuntu 11.04, using just one CPU core (not an OpenMP build):

Benchmarking: Traditional DES [128/128 BS AVX-16]... DONE
Many salts:     5731K c/s real, 5788K c/s virtual
Only one salt:  4647K c/s real, 4647K c/s virtual

The previous version, 1.7.7, achieved about 5000K c/s at the "many
salts" benchmark on this machine.

Here's how this is affected by the -fast-des-key-setup patch (available
for the new 1.7.8 already):

Benchmarking: Traditional DES [128/128 BS AVX-16]... DONE
Many salts:     5723K c/s real, 5781K c/s virtual
Only one salt:  5518K c/s real, 5518K c/s virtual

(the "one salt" speed increases).

With OpenMP, -omp-des-4 exceeds 20 million of hash computations per
second, for the typical "many salts" case:

Benchmarking: Traditional DES [128/128 BS AVX-16]... DONE
Many salts:     20668K c/s real, 2593K c/s virtual
Only one salt:  8724K c/s real, 1094K c/s virtual

That's for 8 threads on this quad-core CPU with SMT.

(By the way, this corresponds to over 500 million of DES block
encryptions per second, or a data encryption speed of 33 Gbps, if we
were encrypting data.  Of course, in practice there would be other
limitations, such as data transfer bandwidth.  But the crypto code and
the CPU are this fast.)

-omp-des-7 achieves decent single-salt speed:

Benchmarking: Traditional DES [128/128 BS AVX-16]... DONE
Many salts:     19759K c/s real, 2479K c/s virtual
Only one salt:  15777K c/s real, 1982K c/s virtual

(I was hoping to merge those patches, but I ran out of time.  Maybe next
time.  For now, they're available as separate patches, but properly
updated for the new version of JtR, so they are easy to apply and use.)

Other changes in 1.7.8 are:

* Corrected support for bcrypt (OpenBSD Blowfish) hashes of passwords
containing non-ASCII characters (that is, characters with the 8th bit
set).  Added support for such hashes produced by crypt_blowfish up to
1.0.4, which contained a sign extension bug (inherited from older
versions of John).  The old buggy behavior may be enabled per-hash,
using the "$2x$" prefix.

* The external mode virtual machine's performance has been improved
through additional multi-op instructions matching common instruction
sequences (assign-pop and some triple- and quad-push VM instructions
were added).

* A new sample external mode has been added to the default john.conf:
AppendLuhn, which appends the Luhn algorithm digit to arbitrary
all-digit strings.

* A few minor bug fixes and enhancements were made.

I am a few hours late with sending this announcement in here, so there's
some press coverage of the new John the Ripper release already:

Formal press release:

"Hello Ripper!" - a Rapid7 community blog post, by Jen Ellis:

'John The Ripper' Gets A Facelift - a Dark Reading news story by Kelly
Jackson Higgins:

Enjoy, reuse, and don't forget to provide feedback.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux - Powered by OpenVZ