Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 17 Oct 2005 10:17:49 -0700
From: h1kari <>
Subject: Re: Using Hardwareaccelerators to speed up John

Solar Designer wrote:
> Yes.  So that's up to 500M c/s at LM for a single chip, but only when
> cracking one or two hashes (according to a table on your website).
> That's very impressive.
> However, reading it another way, the key rate becomes 4 times lower
> with the number of hashes loaded for cracking increased to 128, and you
> currently do not even support loading more than 128.  Is that correct?
> If so, starting with a few thousand LM hashes to crack, John running
> on a single modern CPU would outperform your implementation on a single
> Pico E-12 LO.  Is that correct?

Yeah, I would think so. If you're comparing to a few thousand hashes
after each computation of crypt() would it slow you down a little bit
compared to your performance info you gave earlier? The core could be
adapted to compare with more hashes, but at that point we would have to
start slowing down the algorithm on the card so the compares can keep up
with it.

>>For Unix DES, it would essentially be the Lanman
>>performance / 25, since Unix DES requires 25 rounds, so the max
>>performance of our card is currently ~50M c/s, which is a little less
>>than my projected number in the slides.
> How do you derive that ~50M c/s figure?  500M c/s at LM would translate
> to roughly 20M c/s at traditional crypt(3), no?

Oh sorry, you're right.

> That's still very impressive indeed.  However, the limitation on the
> number of hashes which may be cracked at a time is really unfortunate.
> Perhaps it will be easier to overcome for slower hashes such as crypt(3)
> where you would not make things a lot slower by having the comparisons
> actually take a few extra cycles.  (And you would probably not have too
> many hashes with the same salt anyway.)

Yeah. It would be a lot better if we could do the compares on a host
processor so we don't have to tie up a bunch of logic on the FPGA for it.

>>Yeah. I'm sorry it ended up coming out comparing directly to the
>>functionality of John. The idea I was trying to get across was that when
>>most people think of password cracking, they think of john, and I was
>>doing something similar.
> The comparison is fine, but a few clarifications such as those I've
> given would be in order. ;-)


>>Right now our only
>>nitch with this project is for passwords that can't be easily cracked by
>>John or L0phtcrack.
> Understood.  Is there much legitimate demand for that, though, when
> we're talking OS login passwords?  Penetration testing?

We sold our box to a company that was interested in demonstrating the
insecurity of windows password hashes. It effectively demonstrated that
even with completely random passwords, they could be broken in a short
amount of time with the right hardware, so that was valuable to them
purely for demonstration purposes. I don't know if they're actually
going to use it for pen testing as well.

> I am not aware of such a resource with accurate and up-to-date
> information.  For John specifically, I'd be happy to provide any
> performance numbers you may be interested in.

That would be great.

>>As far as future work. We've been doing a lot of research with the
>>Virtex-4 FX cards and the onboard PowerPCs and we see a lot of potential
>>for using the APU bus to provide custom instructions to software (john)
>>that would allow you to accelerate your DES and other functions with
>>single instruction calls.
> I've checked out this webpage:
> and, as far as I understand, the APU bus is internal to the FPGA chip,
> so are you suggesting to run an embedded Linux and John on one or both
> of the embedded PowerPCs?

Yeah, I was suggesting that mainly because of the latency and overhead
involved with talking to a host processor. Right now if we're generating
20M c/s, that's roughly 1.2Gbps fof passwords going to the card, and
another 1.28Gbps coming back. When we're talking about Lanman/NTLM it's
25x that. The only buses that I think could keep up with that would be
maybe the high end 16x PCIExpress or the high end supercomputing buses
like NUMALink.

>>I don't know how much this would speed up john
>>considering the onboard PowerPCs can only be clocked up to 450MHz, but
>>it seems like it would at least be a bit of a speed improvement over
>>doing the crypto in software. Your comments on this would be really
> Yes, this would probably result in at least some speedup for some hash
> types, however it'd be tricky to manage (e.g., no host filesystem access
> from the embedded OS and vice versa), and not being able to use the
> host's more powerful CPU(s) and bigger memory sounds like a waste.
> I think it'd be better to continue running most of John on the host
> system, but have it communicate with its low-level parts in the FPGA.
> It is not obvious whether the embedded PowerPCs would allow to simplify
> or speedup such communication.  A possible use for them would be to
> implement the logic of weird algorithms such as the FreeBSD-style
> MD5-based crypt(3), leaving the precious logic cells to more instances
> of MD5 itself.

Yeah, that's true. The other thing is we could use the APU bus for
specific operations that could be sped up inside of some of the
algorithms, similar to your SSE or Altivec optimizations, but more
geared towards crypto. Our higher end cards do have 256MB of DDR2 ram
that runs up to 600mhz I believe, and 64MB of flash. We do have plans
for building one in the near future with 2GB+ of user flash, which may
be useful for wordlist storage and such.

>>Also, if we were able to provide the hardware end of this to you guys,
>>would you be able interested in tying it into john?
> I personally would definitely be interested in doing that.  However, I
> wouldn't be able to dedicate a lot of time to it unless the project
> would also be (expected to be) successful commercially.  (This is a
> topic we can discuss off-list.)

Sounds good.

> Perhaps we could start by having the FPGA card do the bare minimum and
> having John running on the host system communicate candidate passwords
> and salts to the card and computed hashes back.  It won't be very fast,
> but it's likely the quickest way to get us started.  We can do it for
> just one hash type initially (perhaps one of the Unix hashes since we
> need it to not be very fast).

Yeah, that sounds like a good starting point which would be fairly
trivial right now. I could get something set up for you and send you a
card + driver in the next couple of weeks if you're really interested.
We're still working on getting linux running fully on our boards, but I
expect APU john optimizations would be trivial once that's completed in
the next couple of months.


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ