Date: Sun, 8 Jul 2012 10:20:40 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: MD5 timings (x96-32) On Sat, Jul 07, 2012 at 04:57:08PM -0500, jfoug wrote: > Also, I think we should make changes in the x86-mmx.h and x86-sse.h, to have > these build types use MD5_X2 and not asm. We'll need to combine this with a gcc version check like I do for BF_X2, but with different gcc versions. Here's what I get on Core i7 920 (best of several runs since the machine is a server with some light load): gcc 3.4.5, x1 - 8100 c/s (asm code, reference speed) gcc 3.4.5, x2 - 5500 c/s gcc 4.0.0, x2 - 8250 c/s gcc 4.1.0, x2 - 8200 c/s gcc 4.2.0, x2 - 7940 c/s gcc 4.5.0, x2 - 8750 c/s Same binaries copied to Pentium 3, 1.0 GHz, no load: gcc 3.4.5, x1 - 2462 c/s (asm code, reference speed) gcc 3.4.5, x2 - 1824 c/s gcc 4.0.0, x2 - 2666 c/s gcc 4.1.0, x2 - 2582 c/s gcc 4.2.0, x2 - 2520 c/s gcc 4.5.0, x2 - 2670 c/s So there's some slight speedup with this change when using gcc 4+, but regressions are possible (seen with 4.2.0 on the Core i7 so far) and more testing is needed (more gcc versions, more CPUs). I guess similar speedup is possible by tuning the x1 asm code for PPro family CPUs (it's currently tuned for the original Pentium), similarly to how we have two code versions for Blowfish in x86.S (with runtime detection). I am unhappy about spending time on this mostly legacy code, but trying lots of gcc versions on lots of CPUs is the same thing in this respect. A related possible optimization is common subexpression elimination in 8 out of 16 steps in MD5's round 3. It can have non-obvious effect on performance too, since on one hand it's one fewer XOR per step, but on the other a register is tied to holding this value between steps. Attached is an experimental patch for this, also experimenting with LEA instead of ADD in _some_ places (would likely need to do it in all if beneficial on target CPUs). I tried this a while ago, and abandoned it for lack of speedup (roughly same speed on P3), although I guess that some speedup could be obtained with more effort. Just to make it clear: this patch is _not_ meant to be committed into any tree. We're just discussing. Alexander View attachment "john-x86-md5.diff" of type "text/plain" (2413 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.