Date: Sat, 19 Nov 2011 22:38:30 -0500
From: Stephen Reese <>
Subject: Re: OpenMP not using all threads

On Sat, Nov 19, 2011 at 8:35 PM, Solar Designer <> wrote:
> On Sat, Nov 19, 2011 at 07:55:50PM -0500, Stephen Reese wrote:
>> I had a feeling that the 32-bit architecture might be an issue as I
>> noticed that "OpenMP example" was only twice as fast (32-bit OpenMP)
>> instead of four times (64-bit OpenMP).
>> Though OpenMP example is four times as fast neither the CVS nor
>> stable/patch versions of John would provide the 4x speed-up I was
>> hoping for even on the 64-bit. Maybe XEN and the other respective
>> hosts across the multiple Linodes I am testing are causing roughly a
>> 45 - 60% slowdown from a bare-metal instance but not affecting the
>> "OpenMP Example".
> It appears that you simply have unstable system performance (changing
> over time as load from other VMs changes).
>> root@:~# time ./loop2
>> 615e5600
>> real    0m2.229s
>> user    0m2.226s
>> sys     0m0.002s
>> root@:~# time ./loop
>> 615e5600
>> real    0m0.333s
>> user    0m1.313s
>> sys     0m0.003s
> This would be a 7x speedup if it were for real, but notice how the user
> time decreased as well - indicating that load from other VMs probably
> halved between these two invocations.  You'll need many more invocations
> of your benchmarks to see the overall difference between the different
> builds despite of the changing load.
>> What I am trying to achieve: I have 42 DES passwords and three
>> Linodes. Password list is currently split-up so each host has 12
>> entries and are running in incremental mode. Is there a better way,
>> such as specifying a thread per instance on a single host?
>> Is there a performance/time benefit in splitting up the password list
>> amongst multiple hosts or is one host going to achieve the same
>> results as the three?
> This depends on the hashes per salt ratio.  You didn't mention how many
> different salts you have.  Is it 42 hashes with 42 different salts?
> Anyhow, you may achieve a very slight increase in c/s rate (due to lower
> key setup overhead) by not splitting your 42 hashes (have all nodes load
> all 42), but instead splitting the candidate password space.  However,
> this improvement would probably be negated by slightly less optimal
> order in which candidate passwords would be tested then (e.g., you'd
> split by length: 0-6, 7, 8).  So continuing like you have started is
> fine.  3*12 is 36, not 42, though.
> Also note that OpenMP generally performs poorly when the system is
> under other load.  In your case, this other load comes from other VMs.
> Even a 10% load from other processes/VMs may result in a 50% slowdown of
> your task with OpenMP, unfortunately.  And it can be even worse than
> that: (yes, you may
> try the GOMP_SPINCOUNT workarounds from there).
> As an alternative, you may try an MPI build of -jumbo, even across all
> three of your Linodes.
> Or as a simpler alternative, yes, you may choose to use many instances
> of non-OpenMP builds.  Then other load will have less of an effect, but
> the key setup overhead will increase.  The CVS version and the
> -fast-des-key-setup-3 patch (your choice) reduce the key setup overhead,
> though, making it almost negligible.  In 1.7.8 release, it's about 10%
> when cracking just one DES-based crypt(3) hash.  With the newer code or
> the patch, it reduces to about 3%.  You probably lose a lot more than
> that to OpenMP's unfriendliness to system load, so you'll improve things
> overall by going for separate processes.
> Alexander


Thanks for the quick response. There are 42 hashes and 42 unique salts
(13/node). I am going to change this so there are 42 hashes per node
and specify the length, (1-6, 7, 8 for All.chr).

OpenMP has consistently been around 5000K but I tested another
recommendation of yours for running non-OpenMP due to the previously
discussed system load woes (GOMP_SPINCOUNT did not help). Four
non-OpenMP run at 2000K and a fifth at ~1000K using the same john.pots
and password file via multiple sessions--they seem to even out after a
bit but a combined 9000K is great! This is what I was looking for.

Thanks for your help!

