Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 27 Mar 2017 13:55:34 +0200
From: Solar Designer <>
Subject: Re: optimizing bcrypt cracking on x86

On Sun, Mar 26, 2017 at 08:52:45PM +0200, Solar Designer wrote:
> On Wed, Jun 24, 2015 at 09:43:47AM +0300, Solar Designer wrote:
> > BTW, optimizing the effective address calculation could make a
> > difference.  In my testing on Haswell, not of bcrypt code but in
> > general, there is performance impact from using larger than 1-byte
> > displacement, as well as from using an index register (even if the
> > scaling factor is set to 1).  Loads that use only a base register with
> > either no displacement or a 1-byte displacement (sufficient e.g. to skip
> > P-boxes and get to the first S-box) run faster (or rather, have less
> > impact on issue rate of adjacent instructions).
> FWIW, I just found this note:
>     L1 Data Cache Latency = 4 cycles for simple access via pointer
>     L1 Data Cache Latency = 5 cycles for access with complex address calculation (size_t n, *p; n = p[n]).
> So maybe (just maybe) the "performance impact from using larger than
> 1-byte displacement, as well as from using an index register" was
> actually from an extra cycle of latency on the load itself (the effect
> of which on issue of other instructions may vary).

Also, I just noticed this:

"in Sandy Bridge [...] Address generation was a single cycle when the
(base + offset) is < 2K, with an extra cycle for larger (base + offset)
or (base + index + offset) addressing" and per that same article it
sounds like this remained the same in Haswell.


Powered by blists - more mailing lists

Your e-mail address:

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.