john-users - Re: to MPI or not MPI john?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20060607071651.GE9799@openwall.com>
Date: Wed, 7 Jun 2006 11:16:51 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: to MPI or not MPI john?

On Tue, Jun 06, 2006 at 02:51:18PM +0200, Otheus (aka Timothy J. Shelling) wrote:
> You mentioned a million hashes per second --- now I see your point about why
> this approach is not so scalable. The "cracks per second" metric reported by
> the --status command must then be including the words *rejected* by the
> filter.

No, it does not.

> Otherwise I wouldn't get almost perfect speedup using the internal
> chksum filter.

You got almost perfect speedup because you had many salts and not too
many nodes.  You mentioned 299 salts.  With 1M hashes computed per
second on each node, that gives us "only" 3,300 of candidate passwords
tried per second on each node.  If you have, say, 4 nodes, it means that
each node has to generate and chksum-filter around 13,000 of candidate
passwords per second.  That's not too many - so most of the CPU time is
left for the actual hash computations.

This is why I am saying that the scalability problem will primarily be
seen with saltless hashes or when cracking hashes with a single salt or
with a small number of salts.

> >> ... with unsalted hashes,
> >> it's clearly optimal to distribute the target patterns (the hashed
> >> passwords) amongst the nodes.
> >
> >I re-read the above several times, but I still don't understand you.  If
> >I parse the above literally, you're suggesting that with multiple
> >unsalted hashes it would be optimal to have different nodes compare
> >against different hashes
> 
> That is what I'm saying, assuming word generation is (even slightly) slower
> than hash generation.

That's irrelevant.  Even if each candidate password takes 1 second to be
generated and only 1 microsecond to be hashed, you do not want to be
repeating any of those operations on each node - for the same candidate
passwords - only for the computed hashes to be compared against
different loaded ones.  You can do the same on a single node.  By using
multiple nodes in this way, you're only distributing the processing cost
of hash comparisons - not of candidate password generation and not of
hashing.  It is far more reasonable to distribute the latter costs, if
the implementation is scalable enough for the given number of nodes.

> Let's say you have a target password database of 10 hashes, and you have 2
> nodes.  Each node uses the same word-generation algorithm. Give each node
> half the hashes to crack. They should (on average) both complete (at about)
> half the time it would have taken 1 node with those same 10 hashes.

No.  If those hashes are saltless, both nodes would be done with their
share of the work at about the same time it would have taken a single
node to complete all of the work with 10 hashes.  So there would be no
speedup from using multiple nodes at all.  (This assumes that the cost
of hash comparisons is negligible, which is not always true.  But let's
not get into that for now.)

It's only with salted hashes and multiple salts that distributing the
hashes across your nodes makes sense.  You might not see the perfect 2x
speedup with 2 nodes, though, since there's some per-candidate-password
(rather than per-salt) processing even with salted hashes.  Obviously,
this "overhead" is only visible with small numbers of salts per node.
If you have 1,000 hashes and 10 nodes, then each of your nodes will be
doing the per-candidate-password processing only once per 100 hash
computations, so the "overhead" will be negligible.  But with 10 hashes
and 10 nodes, it might correspond to around 10% of your total processing
cost.  So you'd get a 9x speedup over a single node - not 10x.

> If we have 10 nodes, and we split the target database of hashes, then it
> should take (roughly) about 1/10th the time it would have taken 1 node.
> 
> How could it get any faster than that?

As I've explained above, it might take 1/9th the time - and you can
improve that to 1/10th by splitting the keyspace efficiently instead.

Also, this distribute-the-hashes approach only works as long as the
number of salts is no smaller than the number of nodes.  When these
numbers become close to each other, things get unoptimal.  When the
number of different salts is smaller than the number of nodes, you
can't make use of some of your nodes at all.  So we're back to splitting
the keyspace.

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.