Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 27 Jun 2013 20:34:25 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

On Thu, Jun 27, 2013 at 11:35:05AM -0400, Yaniv Sapir wrote:
> FWIW, looking at the disassembly, it seems like the loop there spans over
> ~50 instructions, which means that it takes the order of magnitude of 50
> cycles (probably some stalls due to dependency are balanced with
> instruction multi-issue). If this is right, then unrolling the loop further
> will add marginal gain - as the branch penalty is 4-5 cycles.

Yes, marginal gain indeed - yet we should do it if we can.

Full unrolling lets us save not only on loop control instructions, but
also on updating of index/pointer variables (instead constant
displacements might be substituted into some of the load instructions).

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.