Date: Sat, 19 Nov 2011 19:55:50 -0500
From: Stephen Reese <>
Subject: Re: OpenMP not using all threads

On Sat, Nov 19, 2011 at 5:58 PM, Solar Designer <> wrote:
> On Sat, Nov 19, 2011 at 04:29:12PM -0500, Stephen Reese wrote:
>> I have patched john-1.7.8.tar.gz with john-1.7.8-omp-des-7.diff.gz in
> -omp-des-7 is good if you want to attack just one salt or very few
> salts.  For many salts, -omp-des-4 provides better performance.  (This
> is mentioned on the wiki.)
> Alternatively, if you feel adventurous, you may do a CVS checkout for
> even newer code (currently known as development version, which
> combines the best properties of these two patches into one source code
> tree (no patches are needed).  The CVS checkout instructions are here:
>> order to utilize four threads from a E5520 on a Debian system but it
> Actually, you should be running 8 threads on this CPU unless you have
> Hyperthreading disabled.  But you don't need to worry about that - gcc's
> libgomp will run as many threads as you have logical CPUs by default
> (most likely 8).
>> instead seem like it is only using two. When testing DES I see around
>> 2500K c/s and when patched about 5000K. I was hoping for closer to
>> 8000K to 10000K.
> Yes, you should get about 9000K (for 8 threads combined).
> The increase may be less than 4x because the thread-safe code is slower,
> because the CPU clock rate is lower when all cores are in use (E5520 has
> Turbo Boost), and for certain other reasons.  Yet you should in fact get
> 9000K or so.
>> I also edited the Makefile as follows:
>> # gcc with OpenMP
>> OMPFLAGS = -fopenmp -msse2
>> Another strangeness is when testing is I am not seeing the -16
>> appending to the following:
>> Benchmarking: Traditional DES [128/128 BS SSE2]... DONE
>> Is this normal or did something go wrong.
> It looks like you made a 32-bit build.  That is, you probably used the
> linux-x86-sse2 make target instead of linux-x86-64.  When you build
> without OpenMP, the -sse2 target uses assembly code supplied with JtR,
> however when you go for OpenMP, gcc has to generate thread-safe code
> instead.  It does this well for x86-64, but not for 32-bit x86 (there
> are too few registers on 32-bit x86).
> To get decent performance at DES with OpenMP builds on your machine, you
> ought to make 64-bit builds.  And indeed your install of Debian should
> be 64-bit, too.
> I hope this helps.
> Alexander
> P.S. The Subject is almost certainly wrong - there's no indication that
> the build doesn't use all threads.  Rather, the threads are slow.


Thanks for the great information and noted about the Subject line. The
tests were on a Linode which is shared XEN hosting.

I had a feeling that the 32-bit architecture might be an issue as I
noticed that "OpenMP example" was only twice as fast (32-bit OpenMP)
instead of four times (64-bit OpenMP).
Though OpenMP example is four times as fast neither the CVS nor
stable/patch versions of John would provide the 4x speed-up I was
hoping for even on the 64-bit. Maybe XEN and the other respective
hosts across the multiple Linodes I am testing are causing roughly a
45 - 60% slowdown from a bare-metal instance but not affecting the
"OpenMP Example".

root@:~# time ./loop2
real    0m2.229s
user    0m2.226s
sys     0m0.002s
root@:~# time ./loop
real    0m0.333s
user    0m1.313s
sys     0m0.003s

What I am trying to achieve: I have 42 DES passwords and three
Linodes. Password list is currently split-up so each host has 12
entries and are running in incremental mode. Is there a better way,
such as specifying a thread per instance on a single host?

Is there a performance/time benefit in splitting up the password list
amongst multiple hosts or is one host going to achieve the same
results as the three?

