Date: Thu, 27 Jun 2013 20:34:25 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt On Thu, Jun 27, 2013 at 11:35:05AM -0400, Yaniv Sapir wrote: > FWIW, looking at the disassembly, it seems like the loop there spans over > ~50 instructions, which means that it takes the order of magnitude of 50 > cycles (probably some stalls due to dependency are balanced with > instruction multi-issue). If this is right, then unrolling the loop further > will add marginal gain - as the branch penalty is 4-5 cycles. Yes, marginal gain indeed - yet we should do it if we can. Full unrolling lets us save not only on loop control instructions, but also on updating of index/pointer variables (instead constant displacements might be substituted into some of the load instructions). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.