john-dev - distributed processing with untrusted machines (was: MD5-crypt on Nvidia GPUs tips)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120402150435.GA11351@openwall.com>
Date: Mon, 2 Apr 2012 19:04:35 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: distributed processing with untrusted machines (was: MD5-crypt on Nvidia GPUs tips)

On Mon, Apr 02, 2012 at 11:05:12AM +0200, Simon Marechal wrote:
> On 01/04/2012 13:03, Solar Designer wrote:
> > On the other hand, machines with mostly-idle CPUs
> > tend to be readily available at any a given company (and in large
> > quantities), whereas high-end GPUs would need to be specifically
> > purchased for the password cracking.  So the comparison is not simple if
> > we try to consider these things.
> 
> This is true for benchmarking,

...and for contests, as well as for many kinds of malicious use. ;-)

> but a box with 4 large GPUs can be locked
> in a room and disconnected from untrusted networks. It is harder to
> trust random PCs in most companies with confidential customer data.

Sure.  To partially mitigate this, once we finally get distributed
processing support (better than the current MPI stuff) into JtR, a
potential enhancement will be supporting operation modes where
relatively little is revealed to the nodes.  I think this can be
accomplished in one of two ways (both may be supported as options), or
with a mix of the two:

1. Only partial hashes may be distributed to nodes (e.g., 32-bit or
smaller).  The nodes will produce a steady stream of false positives
(but not so many that this would become a bottleneck), which the master
will need to verify (and record successful guesses only on full match).
The master will need to avoid notifying the nodes of successful guesses
or it may do so with a large random delay such that the timing of such
notifications does not reveal which partial hash matches correspond to
real guesses (full matches).  The master will also need to be careful
not to reveal this via side-channels, such as timing of its further
communication with the nodes (e.g., a disk file write could make a
polling delay slightly longer than usual).  The stream of random numbers
used for the random delays (if those are used) should be
cryptographically secure.

2. For very slow hashes and relatively small networks, the nodes may
perform the hashing only, but no comparisons.  Instead, they'll feed the
computed hashes (or partial hashes to save on traffic) back to the
master, which will do the comparisons.  For example, if we have 100
machines each doing bcrypt at 5000 c/s, that's only 500k hashes to be
sent back to the master per second.  Even if we send, say, 20 bytes per
hash (e.g., 16 bytes for plaintext password and 4 bytes for partial
hash), that's only 10 MB/sec - so a master on a 100 Mbps Ethernet port
will be able to receive that stream.

A possible combination of the two approaches may involve very small
partial hashes being sent to the nodes.  For example, 8-bit.  When
there's just one hash per salt (which is typical when auditing hashes
from well-designed systems), this will reduce the traffic of the second
approach above by a factor of 256, thereby allowing for lower overhead,
much greater scalability, and/or use of the approach for not-so-slow
hashes (like MD5-crypt).

Unfortunately, with any of these approaches the full salts will need to
be revealed to the nodes.  This is especially bad for hash types where
usernames are used as salts.  While it is possible to add some fake
salts into the mix, that would reduce efficiency a lot while being of
very little help in protecting the actual salts.  Well, it can be
affordable and it can actually protect the salts in some special cases
(such as when we use all 4096 salts with DES-crypt), but in those cases
this is also relatively unimportant.

For some hash types, where salts are only involved during early steps of
computation of a slow hash (e.g., phpass or better yet its Drupal 7's
revision), it may be possible to process the salts on the master, but
then we also have a steady stream going from the master to the nodes.

So I think we won't bother protecting the salts and assume that this
stuff would be for at least semi-trusted systems.

While I had these thoughts for years, I think that actually implementing
this is still in a distant future for us (if we get there at all).  We
need to gain built-in distributed processing first (non-MPI), and only
then worry about enhancing it.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.