Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 13 Oct 2006 23:32:52 +0400
From: Solar Designer <>
Subject: Re: JTR and os X macintel

A week ago, Randy posted some benchmarks for a 64-bit build on the new
Xeon, demonstrating excellent performance:

On Thu, Oct 05, 2006 at 02:17:38PM -0500, Randy B wrote:
> model name      : Intel(R) Xeon(R) CPU            5160  @ 3.00GHz

> Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
> Many salts:     2859K c/s real, 2859K c/s virtual
> Only one salt:  2395K c/s real, 2395K c/s virtual

> Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
> Raw:    12783 c/s real, 12783 c/s virtual
> Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE
> Raw:    462 c/s real, 461 c/s virtual

> Benchmarking: NT LM DES [128/128 BS SSE2-16]... DONE
> Raw:    15908K c/s real, 15908K c/s virtual

Randy was kind enough to also run benchmarks for a 32-bit build on the
same CPU and to e-mail the results to me.  I'll summarize the results as

My SSE2-16 bitslice DES code introduced in 1.7.2 (that is, code making
use of the 16 XMM registers as available in 64-bit mode) is actually
around 10% faster than plain SSE2 code on that CPU.  So we finally have
proof that the time I had invested into the SSE2-16 code hadn't been
wasted.  (There was no such obvious advantage with SSE2-16 on older
x86-64 CPUs that I've tried running JtR benchmarks on.)

As expected, the FreeBSD-style MD5-based hashes benchmark shows about
50% higher c/s rate in the 64-bit build.  This is because the
availability of extra registers makes it possible to efficiently compute
two instances of MD5 in parallel (and that's with pure C code vs.
assembly code in the 32-bit build).

The OpenBSD-style Blowfish-based hashes benchmark, however, shows a 10%
higher c/s rate in the 32-bit build.  This is because the assembly code
for Blowfish is still more suitable even for this new CPU, whereas the
64-bit build can't use the old 32-bit assembly code (so it's pure C),
whereas the algorithm is the same (it does not try to compute two hashes
in parallel).  I intend to correct that in future versions of JtR.

Alexander Peslyak <solar at>
GPG key ID: 5B341F15  fp: B3FB 63F4 D7A3 BCCC 6F6E  FC55 A2FC 027C 5B34 1F15 - bringing security into open computing environments

To unsubscribe, e-mail and reply
to the automated confirmation request that will be sent to you.

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ