john-users - Re: OpenMP not using all threads

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111120013523.GA7317@openwall.com>
Date: Sun, 20 Nov 2011 05:35:23 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: OpenMP not using all threads

On Sat, Nov 19, 2011 at 07:55:50PM -0500, Stephen Reese wrote:
> I had a feeling that the 32-bit architecture might be an issue as I
> noticed that "OpenMP example" was only twice as fast (32-bit OpenMP)
> instead of four times (64-bit OpenMP).
> http://openwall.info/wiki/internal/gcc-local-build#OpenMP-example.
> Though OpenMP example is four times as fast neither the CVS nor
> stable/patch versions of John would provide the 4x speed-up I was
> hoping for even on the 64-bit. Maybe XEN and the other respective
> hosts across the multiple Linodes I am testing are causing roughly a
> 45 - 60% slowdown from a bare-metal instance but not affecting the
> "OpenMP Example".

It appears that you simply have unstable system performance (changing
over time as load from other VMs changes).

> root@:~# time ./loop2
> 615e5600
> real    0m2.229s
> user    0m2.226s
> sys     0m0.002s
> root@:~# time ./loop
> 615e5600
> real    0m0.333s
> user    0m1.313s
> sys     0m0.003s

This would be a 7x speedup if it were for real, but notice how the user
time decreased as well - indicating that load from other VMs probably
halved between these two invocations.  You'll need many more invocations
of your benchmarks to see the overall difference between the different
builds despite of the changing load.

> What I am trying to achieve: I have 42 DES passwords and three
> Linodes. Password list is currently split-up so each host has 12
> entries and are running in incremental mode. Is there a better way,
> such as specifying a thread per instance on a single host?
> 
> Is there a performance/time benefit in splitting up the password list
> amongst multiple hosts or is one host going to achieve the same
> results as the three?

This depends on the hashes per salt ratio.  You didn't mention how many
different salts you have.  Is it 42 hashes with 42 different salts?

Anyhow, you may achieve a very slight increase in c/s rate (due to lower
key setup overhead) by not splitting your 42 hashes (have all nodes load
all 42), but instead splitting the candidate password space.  However,
this improvement would probably be negated by slightly less optimal
order in which candidate passwords would be tested then (e.g., you'd
split by length: 0-6, 7, 8).  So continuing like you have started is
fine.  3*12 is 36, not 42, though.

Also note that OpenMP generally performs poorly when the system is
under other load.  In your case, this other load comes from other VMs.
Even a 10% load from other processes/VMs may result in a 50% slowdown of
your task with OpenMP, unfortunately.  And it can be even worse than
that: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706 (yes, you may
try the GOMP_SPINCOUNT workarounds from there).

As an alternative, you may try an MPI build of -jumbo, even across all
three of your Linodes.

Or as a simpler alternative, yes, you may choose to use many instances
of non-OpenMP builds.  Then other load will have less of an effect, but
the key setup overhead will increase.  The CVS version and the
-fast-des-key-setup-3 patch (your choice) reduce the key setup overhead,
though, making it almost negligible.  In 1.7.8 release, it's about 10%
when cracking just one DES-based crypt(3) hash.  With the newer code or
the patch, it reduces to about 3%.  You probably lose a lot more than
that to OpenMP's unfriendliness to system load, so you'll improve things
overall by going for separate processes.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.