Date: Wed, 21 Mar 2012 09:13:54 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: MD5 optimizations magnum, all - Searching the web for info on whether folks are already using SSSE3 PSHUFB for S-boxes or not (it turns out that yes, at least for AES), I found these forum threads: http://hashcat.net/forum/thread-153.html http://www.freerainbowtables.com/phpBB3/viewtopic.php?f=6&t=904&start=60#p15387 http://www.cryptohaze.com/forum/viewtopic.php?f=4&t=147#p865 These mention some MD5 optimizations that I previously did not consider. There are some rotates by 16 in MD5. The attached patch optimizes those for SSE2 (two instructions) and SSSE3 (one instruction). Either of these gives a speedup of around 1% on the 2xE5420 system I use for testing, with the SSSE3 version being slightly faster. (There's little point in testing this on my Bulldozer, because it is XOP-capable and is likely faster running the XOP code instead.) I only tested this with gcc so far, so the resulting code is slower than the icc precompiled version in all of my tests. I think the *.S files need to be re-generated with icc (perhaps for SSE2 only for simplicity?) after applying this patch. Another possible optimization is a common subexpression elimination in round 3: http://hashcat.net/forum/thread-153-post-709.html#pid709 but it might not always be helpful ("The second optimization is not good because it trades a PXOR rd,rs for a MOVDQA rd,rs. Now with AVX this might be useful because it let's you do VPXOR rd,rs1,rs2 (rd = rs1^rs2). Even with AVX you need more registers because you now need two temp registers per interlace ..." from a comment by Sc00bz). I haven't tried that one out yet. Anyone? Alexander View attachment "sse-intrinsics.c.diff" of type "text/plain" (775 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.