Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 1 Dec 2007 05:50:37 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: AES Bitslice and the PS3 MD5 cracking.

On Fri, Nov 30, 2007 at 11:01:38PM +0000, Larry Bonner wrote:
> I hope Solar Designer doesn't mind me putting this question to some on
> the mailing list.

It's OK as long as these discussion threads don't run for too long, and
the majority of postings are still directly related to JtR.

> I'm just curious how complicated it would be to implement AES as a
> bitslice algorithm, as i've read that for CORE2 CPU, its actually
> faster than "normal" way to compute AES.

I haven't looked into this myself, but it's quite possible.  A Google
search for "bitslice AES" (without the quotes) gives some references,
including to a sci.crypt discussion that ended with:

"Short answer: yes, on architectures with 128 bits words or more."

As to your "how complicated" question, I think it's the same order of
complexity for most popular ciphers and cryptographic hashes that are
reasonable to implement in this way at all.  Things can get a lot more
difficult when you try to achieve a certain level of performance - such
as to outperform another implementation.

The URL for a presentation on PS3/Cell that I had posted a while ago:

	http://www.hyperelliptic.org/SPEED/slides/Osvik_cell-speed.pdf

also briefly describes a non-bitslice but architecture-specific
implementation of AES for the Cell and mentions that "bitslice AES would
be very interesting".

> Also, the comments on MD5 bitslice were interesting, did anyone notice
> the claims in the media recently that Nick Breese was able to compute
> 1.4 billion hashes per second on a PS3?

I've just Google'd around to figure out what this was about.  My guess
is that it's 1.4 billion partial computations of another cryptographic
primitive per second, not MD5.  Probably RC4, as many articles mention
Office "documents" and the like.

> Could anyone here verify how fast Solars MD5 implementation runs on PS3??

This makes little sense.  If you're referring to the proof-of-concept
bitslice implementation, then it's just that - a proof-of-concept.  As
you know, it's just C code, and not the most optimal at that.  More
importantly, we also know that Cell SPUs are good for implementing MD5
in the straightforward 4-way SIMD way.  If they lacked bit rotates or
something, things could be different.

As to the MD5 implementation in JtR (for MD5-based crypt(3) hashes), it
only uses 32-bit words, no SIMD.  The best it knows to do is mix
instructions for two MD5 computations for greater instruction-level
parallelism and to cover large instruction latencies.

Obviously, a future version of JtR should have improvements in this area.

> 1.4 billion p/s seems extremely fast..is it possible?

Probably not for MD5.  Although this article specifically mentions MD5:

	http://www.heise-security.co.uk/news/99674

it also almost correctly estimates that this would mean around 50 clock
cycles per "iteration".  If the main CPU is used in addition to the
SPEs, this required minimum number of cycles per "iteration" to achieve
1.4 billion "iterations" per second would increase - maybe even by as
much as 50% since the main CPU is multiple-issue.  So this gives us an
upper boundary for the number of cycles per "iteration" of around 80.

Well, a full MD5 compression function computation involves 64 rounds,
with each round consisting of 7-8 operations.  Even if only half the
rounds need to be computed (which might be possible when trying to match
one specific hash value and trying candidate passwords in a hashing
algorithm-dependent order) and each round is reduced to half the number
of instructions (due to availability of suitable multi-op instructions -
I have no idea if this is the case for the Cell - probably it is not),
this still gives us over 100 instructions per "iteration".  Since the
SPUs are single-issue, this is also the lower boundary for the number of
clock cycles.

Thus, no, 1.4 billion MD5s per second on a PS3 seems unrealistic, while
a few hundred million is realistic.

> has anyone seen Nicks presentation, or indeed code

I have not.

> to see if his claims are not just hype?

I doubt that he made such claims.  For example, he might have mentioned
MD5 in some context, then proceeded to give numbers for RC4.  The press
did the rest.  Of course, that's just my guess.

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: 5B341F15  fp: B3FB 63F4 D7A3 BCCC 6F6E  FC55 A2FC 027C 5B34 1F15
http://www.openwall.com - bringing security into open computing environments

-- 
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ