Date: Thu, 25 Jun 2015 07:33:21 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: optimizing bcrypt cracking on x86 Regarding the 2x2 MMX2 code on i7-4770K: On Wed, Jun 24, 2015 at 07:10:07AM +0300, Solar Designer wrote: > On 64-bit builds, though, I only got this to run at cumulative speeds > like 780*8 = 6240 c/s, which is worse than 6595 c/s previously seen with > OpenMP (and even worse than the slightly better speeds that can be seen > with separate independent processes). I managed to improve this to 796*8 = 6368 c/s by removing some of the large displacements on loads, and instead keeping them in base registers (using the extra GPRs that we have in 64-bit mode for this). For the 288 bytes of P, an offset into the middle of this range may be put into a register, and then 256 out of the 288 bytes may be accessed via 1-byte displacements (or alternatively 248 out of 288, but then we can also access the first S-box via the same base register with 0x78 in the 1-byte displacement). Also, remembering that R13 is special just like RBP (no without-displacement encoding) can sometimes be helpful. This is still not good enough, though. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.