Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Date: Wed, 21 Mar 2012 09:13:54 +0400
From: Solar Designer <>
Subject: MD5 optimizations

magnum, all -

Searching the web for info on whether folks are already using SSSE3
PSHUFB for S-boxes or not (it turns out that yes, at least for AES), I
found these forum threads:

These mention some MD5 optimizations that I previously did not consider.

There are some rotates by 16 in MD5.  The attached patch optimizes those
for SSE2 (two instructions) and SSSE3 (one instruction).  Either of
these gives a speedup of around 1% on the 2xE5420 system I use for
testing, with the SSSE3 version being slightly faster.  (There's little
point in testing this on my Bulldozer, because it is XOP-capable and is
likely faster running the XOP code instead.)

I only tested this with gcc so far, so the resulting code is slower than
the icc precompiled version in all of my tests.  I think the *.S files
need to be re-generated with icc (perhaps for SSE2 only for simplicity?)
after applying this patch.

Another possible optimization is a common subexpression elimination in
round 3:

but it might not always be helpful ("The second optimization is not good
because it trades a PXOR rd,rs for a MOVDQA rd,rs. Now with AVX this
might be useful because it let's you do VPXOR rd,rs1,rs2 (rd = rs1^rs2).
Even with AVX you need more registers because you now need two temp
registers per interlace ..." from a comment by Sc00bz).

I haven't tried that one out yet.  Anyone?


View attachment "sse-intrinsics.c.diff" of type "text/plain" (775 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.