Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 15 Jun 2012 16:54:15 +0200
From: Tavis Ormandy <>
Subject: [patch] optional new raw sha1 implemetation

[starting a new thread]

> On 2012-06-14 21:38, magnum wrote:
> > On 2012-06-14 16:32, Frank Dittrich wrote:
> > > On Thu, Jun 14, 2012 at 6:19 PM, Tavis
> > > Ormandy<> wrote:
> > > > p.s. I also have a sha-1 implementation that's a little faster than
> > > > the jumbo version, would this be the right list to send that to? Is
> > > > there a jumbo cvs repo I can checkout to patch against?
> >>
> > > Probably the latest git version is considerably faster than the last
> > > jumbo version.
> >
> > I was going to say "not much" but I just checked raw-sha1 and apparently
> > it's 33% faster. I'm not sure how that happened, from memory the code
> > changes only boosted it by like 10-12% (and this CPU does not support
> > XOP or other stuff that Solar added optimisations for).
> I tracked this down: I was remembering correctly but that was not compared
> to Jumbo-5 but to 80x4 [1] magnum-jumbo. The sha-1 intrinsics changes by
> Jim made about half of those 33%, and my optimisations of set_key() in the
> sha1 formats did the rest. I suppose Travis improved the intrinsics code
> so these changes may well be worth looking at, and compare with the 16x4
> code Jim made. Even if Jim's code turns out to be faster, there may be
> some bits and pieces we can use.
> [1] The 80x4 vs 16x4 refers to the SSE-2 key buffer. In the older code by
> Simon, it's 80 bytes per candidate where 16 bytes are just scratch space.
> In Jim's code, it's 64 bytes just like MD4 and MD5, and the scratch space
> is on stack.
> magnum

I see, thanks for the information magnum! I cloned from your git repo, and my
code still seems to be around ~40% faster on most of my hardware. I'm not
sure you're going to be too happy though, I didn't change any intrinsics, I
actually wrote a new plugin from scratch.

I know SHA-1 and SSE quite well, so I'm not afraid to dive in and start
hacking, but the existing code is quite hard to follow for an outsider!

I understand why, it needs to work well on lots of different silicon, but I
was hoping you might be convinced to include mine as an optional build that
might perform better on some machines.

Here are the numbers on my slow x32 xeon:

$ printf madmda16 | sha1sum | awk '{print $1}' > passwords
$ time ../run/john --format=rawsha1_sse4 passwords
Loaded 1 password hash (Raw SHA-1 (taviso sse4 build) [rawsha1_sse4])
madmda16         (?)
guesses: 1  time: 0:00:01:29 DONE (Fri Jun 15 16:01:37 2012)  c/s: 9612K  trying: madmda11 - madmda16
real    1m29.253s
user    1m28.767s
sys 0m0.049s

$ rm ../run/john.pot
$ time ../run/john --format=raw-sha1 passwords
Loaded 1 password hash (Raw SHA-1 [SSE2 4x])
madmda16         (?)
guesses: 1  time: 0:00:02:02 DONE (Fri Jun 15 16:43:14 2012)  c/s: 6973K  trying: madmda11 - madmda16

real    2m3.020s
user    2m2.356s
sys 0m0.084s

About ~3k c/s faster. I haven't tested it on any AMD hardware, I imagine it
will still be a bit faster as I have some interesting logical optimisations as
well as code optimisations, but maybe not as much of an improvement.

As the code is self contained, I've just attached my first attempt. Let me know
if you have any feedback, or suggestions to help get it merged.

I've only tested it on Linux with GCC, and you need to add -msse4 to CFLAGS in the Makefile.

The code is original, I can assign copright to Solar if required.


p.s. I can live without sse4 if it's a deal breaker, but I use it on
quite a hot path.

------------------------------------- | pgp encrypted mail preferred

View attachment "taviso_fmt_plug.c" of type "text/plain" (24746 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.