Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [day] [month] [year] [list]
Date: Wed, 21 Mar 2012 09:13:54 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: MD5 optimizations

magnum, all -

Searching the web for info on whether folks are already using SSSE3
PSHUFB for S-boxes or not (it turns out that yes, at least for AES), I
found these forum threads:

http://hashcat.net/forum/thread-153.html
http://www.freerainbowtables.com/phpBB3/viewtopic.php?f=6&t=904&start=60#p15387
http://www.cryptohaze.com/forum/viewtopic.php?f=4&t=147#p865

These mention some MD5 optimizations that I previously did not consider.

There are some rotates by 16 in MD5.  The attached patch optimizes those
for SSE2 (two instructions) and SSSE3 (one instruction).  Either of
these gives a speedup of around 1% on the 2xE5420 system I use for
testing, with the SSSE3 version being slightly faster.  (There's little
point in testing this on my Bulldozer, because it is XOP-capable and is
likely faster running the XOP code instead.)

I only tested this with gcc so far, so the resulting code is slower than
the icc precompiled version in all of my tests.  I think the *.S files
need to be re-generated with icc (perhaps for SSE2 only for simplicity?)
after applying this patch.

Another possible optimization is a common subexpression elimination in
round 3:

http://hashcat.net/forum/thread-153-post-709.html#pid709

but it might not always be helpful ("The second optimization is not good
because it trades a PXOR rd,rs for a MOVDQA rd,rs. Now with AVX this
might be useful because it let's you do VPXOR rd,rs1,rs2 (rd = rs1^rs2).
Even with AVX you need more registers because you now need two temp
registers per interlace ..." from a comment by Sc00bz).

I haven't tried that one out yet.  Anyone?

Alexander

View attachment "sse-intrinsics.c.diff" of type "text/plain" (775 bytes)

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ