john-users - Hyperthreading / fork versus mpi / instruction sets?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <000b01d32a6c$a64cec50$f2e6c4f0$@dexlab.nl>
Date: Sun, 10 Sep 2017 21:40:25 +0200
From: <spam@...lab.nl>
To: <john-users@...ts.openwall.com>
Subject: Hyperthreading / fork versus mpi / instruction sets?

Hi list,

I'm using john on one system with an enormous amount of CPUs. The CPUs
support hyperthreading. I'm trying to figure out what the fastest
combination of settings is. From a practical point of view I can perform
benchmarks, simply measuring time for the same task with different settings.
No problems there. However, I'd like to have a good understanding of the
concepts and - if applicable - some specifics based on under the hood
details of john.

Hyperthreading:
As far as I understand this is beneficial only if the cracking code is not
100% optimized. So in theory: not useful, each HT thread cannot do anything
useful since the 'real' core is fully saturated. Practice: run john on HT
cores as well to optimize CPU utilization. Even if the cracking code is 100%
optimized it won't harm me (no disadvantages). Two questions: (1) is this
correct? And (2) any advice about the number of extra HT processes to
assign? Use all? Use just say one or two to compensate for a small fraction
of non-perfect code?

Fork vs. MPI:
I've mentioned that there is a number of hash formats that support MPI and
that john runs those hash types on MPI by default. Furthermore I've seen
that forked parallel processing (--fork=n) is possible for all hash types.
AFAIK, MPI is typically used in network connected multi-system environments.
Forking is done on one machine. My assumption is that forking is more
efficient than MPI because of less overhead (= faster). However MPI might
allow more granular control, rescheduling during the cracking process to get
maximum efficiency, but *only* useful if MPI latency is extremely low
compared the cracking speed. My questions questions: (1) is this correct?
Furthermore: (2) what's the best approach for fast hashes (e.g. raw-md5) and
(3) what's the best approach for slow hashes (e.g. bcrypt)?

Instruction sets:
I've mentioned that john contains *a lot* of instruction set specific
optimized code (e.g. SSE4.x/AVXx). Older multi CPU Xeon E5 and E7 systems
are quite cheap nowadays, and looking at absolute performance (in general)
they're still extremely fast (still in the list of fastest processors).
However they lack e.g. AVX2 support. Right now it's very difficult for me to
figure out what's the best choice, without buying massively expensive new
CPUs. Question: is there a benchmark available of something like the latest
fancy everything supporting CPU versus different instruction sets builds on
this system, so one can figure out what the advantage of buying new CPUs is,
from a john cracking perspective?

Any other considerations or wise advice, especially concerning maximizing
CPU cracking is also more than welcome! Thank you.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.