Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 17 Oct 2005 03:13:43 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Using Hardwareaccelerators to speed up John

Frank,

Thank you for providing references to David Hulton's work in this area.
I was not aware of it.

I've invited David to join us on this mailing list, which he just did.
David, -- any comments you could provide would be most appreciated.

My comments are inline:

> Solar Designer wrote:
> >1. General-purpose FPGA-based boards.  These would need to be programmed
> >for the very specific task.  I briefly evaluated this possibility back
> >in 1998-1999 and it appeared that FPGAs would deliver roughly 5 times
> >better DES performance for the money, compared against the most suitable
> >CPUs (at the time, that was Alpha 21164PC - affordable and really good
> >at bitslice DES).  I used retail prices; the improvement could be a lot
> >better for large quantities.

On Fri, Oct 14, 2005 at 12:20:31AM +0200, Frank Dittrich wrote:
> I think the technology has improved a lot.

Indeed, it has.

However, my estimate from 6+ years ago ("5 times better DES performance
for the money") appears to still hold true for low-end FPGAs.

> Also have a look at the slides:
> http://www.ccc.de/congress/2004/fahrplan/files/340-fpga-slides.pdf

On slide 35, "Password File Cracker", the following performance numbers
are given:

"PC (3.0Ghz P4 \w john)" - 300,000 c/s
"Hardware (Low end FPGA \w jawn)" - 4,000,000 c/s

I am assuming that these are for traditional DES-based crypt(3), which
is 25 iterations of modified-DES.

My guess is that the 3 GHz P4 benchmark was done with John 1.6, which
did not yet use bitslice DES on x86 processors.  Current publicly
available development versions of John do around 700k c/s on 3 GHz P4s.
Current non-public development versions of John do around 900k c/s with
SSE code on the same P4s.  (And as I have already mentioned, PPC G5s are
even faster than that - up to 1.6M c/s - but they're more expensive.)

So this gives us a 5x performance increase.  As it relates to prices,
low-end computers based on P4 Celerons (which are not any slower than
"full" P4s for John) are likely cheaper than low-end general-purpose
FPGA-based cards, both in retail quantities.

> David Hulton claimed you'd be able to "crack password hashes as fast as
> 100+ PCs using FPGA PCMCIA cards on your laptop".

This claim could become the reality, but we're not quite there yet.

Slide 36, "Up & Coming", gives an estimate of 60M c/s for Picomon, the
most powerful card (of those listed) that would be usable in a laptop.
Given that the cards currently available from Pico Computing are priced
at around $2,500, my guess is that this new card, when released, is not
going to be cheaper.  (I'm sure it's a lot cheaper to produce, but
companies such as Pico Computing need to cover their development costs
and make a profit.)

If we're comparing this against desktop PCs, a similarly priced one
would be Apple's PowerMac G5 with dual 2.7 GHz processors.  It can do
over 3M c/s.  So the FPGA-based card is "only" 20 times faster (which is
still a lot!), not 100+ times faster.

> See http://www.ccc.de/congress/2004/fahrplan/event/244.en.html
> (IIRC, "basic functionality of john the ripper" merely implemented
> brute forcing a part of the key space.

This is a very important observation you make.  It's not only about c/s
rates, but also about the order in which candidate passwords are tried.
Much of John's success is due to its ability to try candidate passwords
in an optimal order.

With a PCMCIA card capable of hashing candidate passwords at a rate of
60 million per second, either the card itself will have to generate the
candidate passwords to try (in a far less optimal order) or the laptop's
CPU would become the bottleneck since it wouldn't be able to feed the
card with candidate passwords at this high a speed.

A similar problem exists with testing the computed hashes against a
large number of those loaded for cracking.  Perhaps the hash tables
(used to quickly locate potentially matching hashes) will have to be
loaded onto the FPGA card.

> But it should also be possible to let john create the password
> candidates, and calculate (and compare) the hashes using FPGA
> hardware, using an order of magnitude larger MAX_KEYS_PER_CRYPT
> value than for general purpose CPUs.)

Actually, fully-pipelined implementations of DES are not small, so you
can only fit a handful of them onto current FPGAs.  If I interpret the
numbers from David's slides correctly, he has been assuming 1 to 5
instances of DES per chip.  So MAX_KEYS_PER_CRYPT would need to be
rather small, unless a larger value is determined to help reduce the
communication overhead, etc.

David,

Some more comments on your publications:

http://www.picocomputing.com/press/KeyRecoveryServer.pdf

"World's Fastest Lanman/NTLM Key Recovery Server Shipped."

This press release says that the server can try over 500M LM keys per
second.  Very impressive indeed.  However, the claims that this is "250x
faster than a top of the line CPU" and the "12 hours vs. 136 days" once
again assume unoptimal software.  (I am sure this is unintentional.)

The current publicly available development version of John can do around
7M c/s at LM on modern P4s (2.8+ GHz) and 9.5M c/s at LM on G5 2.7 GHz.
So your special-purpose server (with 10 FPGA cards) appears to be
50 to 70 times faster than individual general-purpose CPUs.  (Curiously
enough, my "5x speedup" estimate from 6+ years ago still holds true.)

Also, I believe John's performance at LM hashes could be made 2-3 times
better if I would re-design it to try candidate passwords in an order
that is optimal for the low-level routines (essentially eliminating key
setup overhead).  Currently, John tries more likely passwords first,
which is highly desirable when using it to detect weak passwords.
Exhaustive key searches with no requirement to get the weakest passwords
cracked early on are quite a different task.

Please don't get me wrong, I find that you're doing the right thing and
I'd be interested in possible cooperation.  I just couldn't resist the
temptation to defend my software. ;-)

Thanks,

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments

Was I helpful?  Please give your feedback here: http://rate.affero.net/solar

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ