|   | 
| 
 | 
Message-ID: <20051013023349.GA26577@openwall.com> Date: Thu, 13 Oct 2005 06:33:49 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Using Hardwareaccelerators to speed up John On Thu, Oct 13, 2005 at 02:14:16AM +0200, sebastian.rother@...erlin.de wrote: > At first: Sorry Solar but I start from zero. > I can't know what you maybe forwarded or not. > So I'm sorry if something here (or maybe even everything) is already known. > > I already talked (a littlebit *looks into the sky*) with Solar direclty. Sebastian, -- actually, I am grateful to you for including "full context" in this posting. :-) > Using OpenSSL could speed up John if a crypto-accelerator is present and > if the OpenSSL-Version supports these Cards. > I'm using OpenBSD wich supports such functions for about 2 years now (but > the official OpenSSL should provide it now also). That's only a part of the theory. There are many things which can make this fail in practice: 1. As you mention, many (most?) crypto accelerator cards are actually quite slow or optimized for operations which are of no use for John presently (e.g., public key crypto). 2. With password cracking, we're encrypting or hashing only very small bits of data, each time with a new key (candidate password). Crypto hardware, however, is generally optimized for an entirely different task, where a lot of data is encrypted or decrypted with a given key and/or a message-digest is calculated for a non-negligible amount of data (e.g., an IP packet). The published Mbps rates are usually those for intended uses and do not take into account any key setup overhead. If the implementations would be abused for password cracking, that overhead could easily amount to most of the processing time. 3. Speaking of DES-based crypt(3) in particular (including all of its flavors that I am aware of), I have not seen any crypto accelerator card vendor declare support for the modified DES that is used in crypt(3). It means that most (all?) of such cards can't be used for computing those hashes, except for the special case of "salt == 0" (encoded as ".."), -- that would be one unlucky password hash in about 4096 for the traditional crypt(3). 4. The stronger password hash types all use multiple iterations of underlying cryptographic primitives. For example, the traditional crypt(3) uses 25 iterations of modified-DES, and the FreeBSD-style MD5-based crypt(3) uses 1000 iterations of MD5 compression function (for passwords no longer than 15 characters). I have not seen any crypto accelerator card vendor declare support for iterating a cryptographic primitive over the same data (with the output becoming the new input) without involvement from the host system. Additionally, often non-trivial processing is required between the iterations, such as for the MD5-based crypt(3). Let's use it as an example: The current implementation of the MD5-based crypt(3) in John (that does not yet use MMX/SSE/AltiVec and the like, -- great speedups are possible here!) achieves 5k c/s on a typical Pentium 4 processor (and up to 10k c/s on the fastest ones available). The 5k c/s correspond to 5 million invocations of the MD5 compression function a second, plus a lot of "high level" overhead. The compression function takes a 64-byte data block and a 16-byte vector as its input, and produces another 16-byte vector as its output. That's 96 bytes of data to transfer per invocation. (In practice, it is likely that a crypto card would not offer the compression function on its own, resulting in more overhead.) Ignoring any protocol overhead, that would amount to 480 Mbytes/second of data transfer to/from the card. That's almost 4 times the PCI bandwidth. Of course, faster buses do exist, but didn't we want an economical solution and also one allowing to use multiple cards in a system (with all sharing the same bus)? 5. As I have already mentioned, some high-level overhead can't be off-loaded from the main CPU unless the crypto card would happen to implement the exact high-level function we require. Such overhead is very far from negligible. Additionally, off-loading the actual crypto to the card introduces the communications overhead. Some software needs to speak a communications protocol to the card. The complexity for such work is comparable and can even exceed that of computing the cryptographic primitives right on the main CPU like John does currently. 6. Some simpler/quicker hash types, such as LM hashes, are likely possible to implement almost fully on existing crypto cards, however they're so quick that the point in doing so is moot. The CPU would once again have to exchange data with the card at such a high rate where the overall performance might not be any better (and is likely to be worse) than it is without the card. The main CPU would also be the one to continue to generate candidate passwords to try at this high rate, so even if the bandwidth and communication protocol overhead issues were somehow resolved, the CPU would quickly become the bottleneck. So much for adding more crypto cards to a system. > Offen you'll find just some realy lame Chips on VPN-Hardware but if you > don#t buy such a Cisco-Junk solution you could also get such a device here > (not sold yet): > > http://www.soekris.com/vpn1461.htm > > This card could, depends to the algorithm, do e.g. up to 920Mbps of DES. Now this is not that bad, however, John already achieves better than that on modern CPUs. In particular, it achieves 1M c/s for traditional crypt(3) on PPC G5 1.8 GHz or P4 3.6 GHz (the latter with non-public SSE code, I must admit). This roughly corresponds to 1.6 Gbps at DES. PPC G5 2.7 GHz does over 1.6M c/s, which roughly corresponds to 2.5 Gbps at DES. More importantly, please see above for why this rate likely does not apply to password cracking. > I thought about this idea because i've some Pentium2-Motherboards here. > If I would e.g. buy 5 Crypto-Cards they would work simultaniusly (like > SMP) on OpenBSD. The algorithm isn't the best Theo told me but it should > work. Your CPU would be the bottleneck. I doubt its performance would be sufficient to fully make use of a single card like that. > because these cards are cheap AND fast it would be logical to "abuse" > these cards to speed up john a littlebit. This may seem logical at first, but it really won't work well. I think I've given enough reasons why not, above. However, not all is lost. There are things which actually would work: 1. General-purpose FPGA-based boards. These would need to be programmed for the very specific task. I briefly evaluated this possibility back in 1998-1999 and it appeared that FPGAs would deliver roughly 5 times better DES performance for the money, compared against the most suitable CPUs (at the time, that was Alpha 21164PC - affordable and really good at bitslice DES). I used retail prices; the improvement could be a lot better for large quantities. 2. A special-purpose accelerator card for John, if such a beast would ever be developed. ;-) (Or any card designed with password cracking in mind, not necessarily for John.) Neither has anything to do with OpenSSL. As explained in John the Ripper documentation, I found crypt(3) to be too limited an API to allow for most efficient implementations, -- which is why John uses other interfaces internally. The OpenSSL API is similarly too limited. > OpenBSD on e.g. a new Via Epia Motherboard would enable John to crack AES > with nearly 20Gbit/s because of the VIa "Pad-Lock"-Engine wich is also > fully supported by OpenBSD. Of course, encryption speed does not translate to cracking speed, as I think I have explained above. (It works better the other way around, which is why I dare to make some statements about John's performance above.) I'd bet the 20 Gbps involves no key setup. With cracking, we've got primarily key setup. Having this said, I'd be interested to review the specifications for this motherboard, -- got an URL? -- Alexander Peslyak <solar at openwall.com> GPG key ID: B35D3598 fp: 6429 0D7E F130 C13E C929 6447 73C3 A290 B35D 3598 http://www.openwall.com - bringing security into open computing environments Was I helpful? Please give your feedback here: http://rate.affero.net/solar
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.