Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 27 Apr 2006 19:39:36 +0400
From: Solar Designer <>
Subject: Re: Performance tuning

On Thu, Apr 27, 2006 at 05:45:46PM +0200, Ami Schwartzman wrote:
> 1) More of a question to Solar Designer: I saw that you provide only the 
> UNIX source, and we might (it's not sure yet) be doing it in Windows.  
> Is it possible for you to provide us the Windows source?

Actually, the only reason the download link for the sources has "Unix"
in it is to ensure that Windows (l)users don't download the sources by
mistake (and quite some of them do anyway...) ;-)

The official source tarball of JtR includes everything needed for
building on Win32 with Cygwin.

> I'm asking 
> because we would eventually release the source back to the community.  
> Will you have a problem with that?

Of course not, as long as you don't violate the GPL.

> 2)This is to everyone who knows the source: Do you think there is room 
> for improvement?


> I'm asking because I understand that DES is pretty 
> hard to paralellize.

You don't need to parallelize individual instances of DES (or of other
underlying ciphers and hashes that John uses).  With password cracking,
there are multiple candidate passwords to be tried - so you can try more
than one in parallel.  John already does this sort of parallelization
with much success.

> Do you think threads,

If you want the absolute best performance, then separate processes are
likely to work better than threads.  As soon as you're starting to use
threads, you have to ensure that different threads store their private
data at different virtual addresses - which may require the use of an
extra register and more complicated addressing modes.  I'd estimate the
slowdown for making the low-level crypto code in John thread-safe at 0%
to 10% for different ciphers/hashes and processor architectures.  For
bcrypt (the Blowfish-based password hashing method), I actually did the
conversion to thread-safe code for the crypt_blowfish package.  The
measured slowdown is 2% to 10% on different architectures.

> making it run on two cores, 

Of course, this would help - to the same extent that starting two
instances of John does.  If done properly, the combined c/s rate would
almost double.  It is important to ensure that the order in which
candidate passwords are tried not be affected by the SMP/multi-core
support significantly.  Otherwise, such support would not be much or
any better than starting two instances of John manually - which is
supported already.

> or utilizing SEE2/3 for example could improve performance?

It is straightforward to hack the bitslice DES code currently in John to
use SSE instead of MMX - and I have that code already.  For DES-based
crypt(3) hashes, it runs 10% to 40% faster on Pentium 4 (including P4
Celerons and P4 Xeons).  For LM hashes, it runs a little bit faster on
some P4 models and a little bit slower on others.  On Pentium 3 and on
AMD processors, the SSE code is slower in all cases (for the benchmarks
I've performed or have seen so far).  I imagine that on future
processors SSE will provide a bigger advantage.  Also, there's likely
room for improvement of the SSE code converted from MMX.

> I'm interested in other algorithms, not just DES.

Indeed.  The use of MMX and/or SSE is obviously beneficial for MD5-based
and similar hashes (MD4, SHA-1) - where several hashes may be computed
in parallel in the vector registers.  IIRC, people have been hacking
John to use this approach for raw MD5 and SHA-1 hashes already.

Alexander Peslyak <solar at>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598 - bringing security into open computing environments

Was I helpful?  Please give your feedback here:

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.