john-dev - Re: XOP for MD5/MD4/SHA-1

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date: Sun, 18 Mar 2012 06:29:29 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: XOP for MD5/MD4/SHA-1

magnum -

On Sun, Mar 18, 2012 at 05:27:53AM +0400, Solar Designer wrote:
> Note that I haven't modified MD4 and SHA-1 to actually use XOP yet ...

I did so now.  I've attached a patch for all of: MD5, MD4, SHA-1.

> ... for raw MD5 para_2 was a lot better than para_3 (but the latter
> is better for MD5-crypt).

I was wrong about that - somehow I did not notice the even better speed
for MD5-crypt there.  I got it now:

Benchmarking: FreeBSD MD5 [SSE2i 8x]... (8xOMP) DONE
Raw:    203013 c/s real, 25426 c/s virtual

Now this is significantly better than Core i7-2600, which IIRC only
gives under 160k on this test.  (Both CPUs benchmarked at stock clocks.)

And here's what I am getting for the raw hashes.  With -x86-64i (Intel
compiler's SSE2 code):

Benchmarking: Raw MD5 [SSE2i 12x]... DONE
Raw:    32896K c/s real, 32682K c/s virtual

Benchmarking: Raw MD4 [SSE2i 12x]... DONE
Raw:    37282K c/s real, 37282K c/s virtual

Benchmarking: Raw SHA-1 [SSE2i 8x]... DONE
Raw:    18236K c/s real, 18236K c/s virtual

With -x86-64 (gcc's SSE2 code):

Benchmarking: Raw MD5 [SSE2i 12x]... DONE
Raw:    24432K c/s real, 24197K c/s virtual

Benchmarking: Raw MD4 [SSE2i 12x]... DONE
Raw:    34473K c/s real, 34473K c/s virtual

Benchmarking: Raw SHA-1 [SSE2i 8x]... DONE
Raw:    17567K c/s real, 17567K c/s virtual

With -x86-64-avx:

Benchmarking: Raw MD5 [SSE2i 12x]... DONE
Raw:    23301K c/s real, 23087K c/s virtual

Benchmarking: Raw MD4 [SSE2i 12x]... DONE
Raw:    35444K c/s real, 35696K c/s virtual

Benchmarking: Raw SHA-1 [SSE2i 8x]... DONE
Raw:    19284K c/s real, 19284K c/s virtual

Finally, the improvement with -x86-64-xop (due to this patch):

Benchmarking: Raw MD5 [SSE2i 8x]... DONE
Raw:    32577K c/s real, 32577K c/s virtual

Benchmarking: Raw MD4 [SSE2i 8x]... DONE
Raw:    36872K c/s real, 36872K c/s virtual

Benchmarking: Raw SHA-1 [SSE2i 8x]... DONE
Raw:    23464K c/s real, 23464K c/s virtual

So raw MD5 and raw MD4 are similar to Intel compiler code's speed,
whereas raw SHA-1 is now 28% faster than Intel's and 21% faster than AVX.

Alexander

View attachment "xop.diff" of type "text/plain" (11690 bytes)

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.