Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20060511050048.GA27597@openwall.com>
Date: Thu, 11 May 2006 09:00:48 +0400
From: Solar Designer <solar@...nwall.com>
To: announce@...ts.openwall.com, john-users@...ts.openwall.com
Subject: John the Ripper 1.7.1

Hi,

I've proceeded with further development of John the Ripper after the 1.7
release.  A new development version is out - numbered 1.7.1:

	http://www.openwall.com/john/

JtR 1.7.1 adds bitslice DES code for x86 with SSE2 for better
performance at DES-based crypt(3) hashes on Pentium 4 and SSE2-capable
AMD processors, as well as assorted high-level changes to improve
performance on current x86-64 processors (both AMD and Intel).

On a related note, the SecurityFocus interview with me on John the
Ripper 1.7 is now also available off the Openwall website:

	http://www.openwall.com/john/interviews/SF-20060222-p1

For those who are interested in some benchmarks of the new code, here
they are.  I've used two systems, one with an Intel P4 Xeon (3.2 GHz)
and the other with an AMD Athlon 64 ("3200+", 2.0 GHz).  Although the
Xeon is capable of Hyper-Threading, I only ran one process, thereby not
taking advantage of HT for these benchmarks.  Both CPUs are SSE2 and
64-bit capable.  The OS on both systems was Linux and the same builds of
John were used (I copied my pre-compiled executables to both systems).

I've omitted the "BSDI DES" and "Kerberos AFS DES" benchmarks to make it
easier to see the really important ones.  The "BSDI DES" results are in
all cases proportional to the "Traditional DES" ones (as expected) and
the "Kerberos AFS DES" implementation is unoptimal and unimportant to
most users of John.

I'll start with the Xeon:

vendor_id       : GenuineIntel
cpu family      : 15
model           : 4
model name      :                   Intel(R) Xeon(TM) CPU 3.20GHz
stepping        : 3

Native 64-bit (pure C, built on Owl-current for x86-64, gcc 3.4.5):

Benchmarking: Traditional DES [64/64 BS]... DONE
Many salts:     949593 c/s real, 949593 c/s virtual
Only one salt:  875699 c/s real, 877454 c/s virtual

Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw:    10106 c/s real, 10106 c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    450 c/s real, 450 c/s virtual

Benchmarking: NT LM DES [64/64 BS]... DONE
Raw:    8848K c/s real, 8848K c/s virtual

The DES performance is rather good and Blowfish is OK, but it's the
performance at FreeBSD-style MD5-based crypt(3) that stands out.
Most CPUs don't cross 10k c/s at this benchmark.  This one does due to
the high clock rate and the availability of 16 registers with x86-64,
which enables John to do two MD5 hashes in parallel, even with pure C
code.

32-bit with SSE2 build on the Xeon:

Benchmarking: Traditional DES [128/128 BS SSE2]... DONE
Many salts:     924518 c/s real, 924518 c/s virtual
Only one salt:  814592 c/s real, 814592 c/s virtual

Benchmarking: NT LM DES [128/128 BS SSE2]... DONE
Raw:    7069K c/s real, 7069K c/s virtual

Although SSE2 is effectively 128-bit, this is a little bit slower than
the native 64-bit build, but it has the advantage of not requiring a
64-bit capable CPU or OS.  Similar performance is expected on non-Xeon
P4s and on P4 Celerons that are not 64-bit capable.

32-bit with MMX build on the Xeon:

Benchmarking: Traditional DES [64/64 BS MMX]... DONE
Many salts:     654080 c/s real, 654080 c/s virtual
Only one salt:  599385 c/s real, 599385 c/s virtual

Benchmarking: NT LM DES [64/64 BS MMX]... DONE
Raw:    6521K c/s real, 6521K c/s virtual

As you can see, both DES-based hashes were faster with SSE2.  In case of
the traditional DES-based crypt(3), the difference is 35% to 40% in
favor of the new SSE2 implementation.  (On older Pentium 4 CPUs, the MMX
code is faster than the above per-MHz, so the advantages of the use of
SSE2 may be smaller.)

For the sake of completeness, the other two benchmarks from the 32-bit
builds (they are the same since these use neither SSE2 nor MMX):

Benchmarking: FreeBSD MD5 [32/32]... DONE
Raw:    9159 c/s real, 9159 c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/32]... DONE
Raw:    453 c/s real, 454 c/s virtual

Here MD5 became a little bit slower compared to the 64-bit build because
there are only 8 registers available in 32-bit mode and only one hash is
being computed at a time.

Now the Athlon 64:

vendor_id       : AuthenticAMD
cpu family      : 15
model           : 47
model name      : AMD Athlon(tm) 64 Processor 3200+
stepping        : 2

Native 64-bit:

Benchmarking: Traditional DES [64/64 BS]... DONE
Many salts:     791219 c/s real, 791219 c/s virtual
Only one salt:  720435 c/s real, 720435 c/s virtual

Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw:    7419 c/s real, 7419 c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
Raw:    330 c/s real, 330 c/s virtual

Benchmarking: NT LM DES [64/64 BS]... DONE
Raw:    6638K c/s real, 6638K c/s virtual

This is rather good considering that the real clock rate is only 2.0 GHz,
but it is slower than the Xeon.  So the "3200+" rating does not hold for
this benchmark.

However, with SSE2 things are better:

Benchmarking: Traditional DES [128/128 BS SSE2]... DONE
Many salts:     951193 c/s real, 951193 c/s virtual
Only one salt:  827776 c/s real, 827776 c/s virtual

Benchmarking: NT LM DES [128/128 BS SSE2]... DONE
Raw:    6474K c/s real, 6474K c/s virtual

Now we're at the same level of performance that the Xeon provides for
DES-based crypt(3).

For comparison against previous versions of John, the MMX build:

Benchmarking: Traditional DES [64/64 BS MMX]... DONE
Many salts:     785318 c/s real, 785318 c/s virtual
Only one salt:  703667 c/s real, 703667 c/s virtual

Benchmarking: NT LM DES [64/64 BS MMX]... DONE
Raw:    6503K c/s real, 6503K c/s virtual

As you can see, this is around 20% slower than SSE2 at DES-based
crypt(3), achieving about the same performance that the native 64-bit
build does.  However, the performance at LM hashes is similar for all
three builds (unlike on the Xeon).

Finally, for the sake of completeness, the other two benchmarks for the
32-bit builds:

Benchmarking: FreeBSD MD5 [32/32]... DONE
Raw:    5935 c/s real, 5935 c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/32]... DONE
Raw:    360 c/s real, 360 c/s virtual

Overall, the new SSE2 code may provide an up to 40% speedup on current
CPUs for DES-based crypt(3) (both traditional and BSDI-style), but its
effect on LM hashes is not always positive.  Future versions of JtR
might provide support for SSE2 with 64-bit builds and improvements for
LM hashes.

Comments are welcome on the john-users mailing list.

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.