Date: Mon, 27 Mar 2017 13:55:34 +0200 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: optimizing bcrypt cracking on x86 On Sun, Mar 26, 2017 at 08:52:45PM +0200, Solar Designer wrote: > On Wed, Jun 24, 2015 at 09:43:47AM +0300, Solar Designer wrote: > > BTW, optimizing the effective address calculation could make a > > difference. In my testing on Haswell, not of bcrypt code but in > > general, there is performance impact from using larger than 1-byte > > displacement, as well as from using an index register (even if the > > scaling factor is set to 1). Loads that use only a base register with > > either no displacement or a 1-byte displacement (sufficient e.g. to skip > > P-boxes and get to the first S-box) run faster (or rather, have less > > impact on issue rate of adjacent instructions). > > FWIW, I just found this note: > > http://www.7-cpu.com/cpu/Haswell.html > > L1 Data Cache Latency = 4 cycles for simple access via pointer > L1 Data Cache Latency = 5 cycles for access with complex address calculation (size_t n, *p; n = p[n]). > > So maybe (just maybe) the "performance impact from using larger than > 1-byte displacement, as well as from using an index register" was > actually from an extra cycle of latency on the load itself (the effect > of which on issue of other instructions may vary). Also, I just noticed this: http://www.realworldtech.com/haswell-cpu/5/ "in Sandy Bridge [...] Address generation was a single cycle when the (base + offset) is < 2K, with an extra cycle for larger (base + offset) or (base + index + offset) addressing" and per that same article it sounds like this remained the same in Haswell. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.