Date: Thu, 25 Jun 2015 01:45:02 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: optimizing bcrypt cracking on x86 On Wed, Jun 24, 2015 at 06:06:16PM -0400, Alain Espinosa wrote: > ...I got speedup for 1 thread/core, but > significant slowdown for 2 threads/core. > > This is other thing that is different in my tests (may be my asm code is suboptimal). In a core i3-2120 I get 4% speed up interleaving 3 keys instead of 2. This is using 4 threads. Of course, on an HT-less CPU you need to interleave 3 or 4 instances rather than just 2. In fact, with my 2x2 MMX2 code I am experimenting with 4 parallel instances (2x crippled SIMD, 2x interleaving) on i7-4770K as well. Replacing those SHLD with MOV+SHR got me slowdown for 2 threads/core even at 4 instances/thread. (But that's the 2x2 thing, not simple 4x interleaving.) Replacing those SHLD with SHRX similarly speeds things up for 1 thread/core (just like MOV+SHR), but keeps performance the same for 2 threads/core (unlike the slowdown seen with MOV+SHR). Unfortunately, it wastes a register to hold the shift count, which may prevent other optimizations. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.