Date: Mon, 28 Jun 2010 13:15:55 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: faster DES on Atom Hi, I think Dango-Chu should have posted this a year ago, but since he did not, I figured that I should do it myself and better late than never - just to keep the "JtR knowledge" in one place. It turned out that for Atom CPUs it is beneficial to change JtR's bitslice DES SSE2 assembly code to use plain SSE instructions (not SSE2), because those are one byte shorter, which helps the decoder and/or caching. I had expected this to turn out to be the case on some CPU, but when working on this code in 2006 I only encountered CPUs that executed both SSE and SSE2 versions of the code at the same speed (AMD CPUs) and those that executed the SSE2 code much faster (Intel CPUs), which is why the decision to go with SSE2 only was made. The switch to plain SSE is trivial to make - just replace all occurrences of five SSE2 instructions with their SSE equivalents in x86-sse.S and/or x86-64.S. This can be done with the following command (uses recent GNU sed): sed -i 's/movdqa/movaps/; s/pandn/andnps/; s/pand/andps/; s/por/orps/; s/pxor/xorps/' x86-sse.S x86-64.S According to Dango-Chu's benchmarks, this provides a 10% speedup for 32-bit builds, but only a 0.5% speedup for 64-bit builds - both on an Atom, indeed. The numbers could be different on other Atom CPUs, and indeed they're very different on non-Atom CPUs - e.g., there's a 2x slowdown from the same change on a Core i7 (just tried). With JtR 1.7.6+, you may additionally need to edit this check in x86-64.h: #if defined(__SSE2__) && \ ((__GNUC__ == 4 && __GNUC_MINOR__ >= 4) || __GNUC__ > 4) #define DES_BS_ASM 0 [...] The purpose of this check is to disable the assembly code in favor of gcc-generated SSE2 code when gcc 4.4.0 or newer is being used. To force the use of plain SSE instead of SSE2, yet compile with gcc 4.4+, you'll need to override this check. This might not result in any improvement, though, not even on an Atom, because the speedup from plain SSE for a 64-bit build, as measured by Dango-Chu, was negligible (see above). The original blog post by Dango-Chu, in Japanese: http://dango.chu.jp/tripper/20090429.html#p01 The same change might also be beneficial on some other CPUs - anyone with a Pentium M? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.