[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 12 Mar 2010 01:35:18 +0100
From: "Magnum, P.I." <rawsmooth@...dband.net>
To: john-users@...ts.openwall.com
Subject: Re: Is JTR MPIrun can be optimized for more cores ?
RB wrote:
> 2010/3/10 Solar Designer <solar@...nwall.com>:
>>> Please be aware that the MPI patch by itself induces (as I recall)
>>> 10-15% overhead in a single-core run.
>> Huh? For "incremental" mode, it should have no measurable overhead.
>
> This is what my experimentation showed. Whether it's the MPI
> initialization or something else, the difference between patched and
> non-patched was statistically significant on my Phenom x4. I'll
> repeat the tests to get more precise numbers, but it's why I made sure
> it was optional.
I did some testing. First an --inc=digits against 1634 DES hashes with
the same salt, until completion. 129 of the hashes was cracked.
- john-jumbo2: 1m27.074s
- mpijohn on one core: 1m27.045s
- mpijohn on two cores: 1m7.025s
The lousy figure when using two cores is because one of the cores
completed its run after just 22 seconds! Not optimal. I tried some
longer jobs running alpha but the problem remains, one job completes
long before the other. In real world usage, incremental mode is not
supposed to complete and this won't be that much of a problem. On the
other hand, the problem will be much larger using many more cores.
Anyway, the tests show that MPI-john has no overhead in itself, just as
I expected. Running on one core, it performs just as vanilla john. It's
*only* a matter of how we split the jobs. So I did another test, using
my MPI patches that auto-splits the Markov range. In this mode the
workload is always evenly split, and this mode is *supposed* to run to
completion.
I ran -markov:250 against 15 LM hashes until completion. 10 were cracked:
- john-jumbo2: 10m0.313s
- mpijohn on one core: 10m1.249s
- john-jumbo2 manually parallel: 5m13.690s
- mpijohn on two cores: 5m14.277s
This is less than 0.2% overhead. Actually, all MPI overhead occurs
before and after the jobs actually run. For a 100 times longer run, the
overhead *should* be 1/100 of the one seen here - and thus completely
insignificant.
The tests was run on Linux/amd64, using MPICH2, running on some Intel
laptop core2 thingy with cpu speed pegged to 1.6 GHz (due to some
problems with heat ^^ )
FWIW: in wordlist and single modes, john-fullmpi will currently leapfrog
rules if used, and otherwise leapfrog words. I haven't yet tested if the
latter would be better all the time. If so, and when loading wordlist to
memory as introduced by the jumbo patch, an mpi job should ideally load
only its own share of words. It's not very sensible to load the full 134
MB "rockyou" wordlist in 32 copies to memory on a 32 core host.
cheers
magnum
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux -
Powered by OpenVZ