Date: Thu, 15 Mar 2012 17:24:53 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: AMD Bulldozer and XOP On Thu, Mar 15, 2012 at 11:46:09AM +0200, Milen Rangelov wrote: > [...] I did some more > improvements (like e.g using SSE3 shuffle to speed up the byte order > reversals in SHA1 and optimizing a bit the early checks). Sounds cool. You got to start contributing to JtR more directly. ;-) > Well, I tried using 128-bit xmm instructions, not the ymm version (which > would require a lot of rework). Oh, you use explicit asm, not intrinsics? Does XOP even offer bit-select instructions for XMM registers? I thought it only added VPCMOV, which operates on YMM registers. Or do you mix XMM/YMM? > Speed with both the Kwan's sboxes/SSE2 and > Roman's with XOP is about 5M c/s (6 threads). I believe the problem is > somewhere else (my key/block setup to blame perhaps). Maybe you unrolled the loop too much and it doesn't fit in L1 cache? With inlined S-boxes, we're only able to unroll 2 rounds (so we get 8 loop iterations per DES block encryption). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.