Date: Sat, 13 Jul 2013 14:47:59 -0400 From: Yaniv Sapir <yaniv@...pteva.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Katja, For/While-loops are translated to B<cond> instructions. This means that for every iteration (but the last one) you'll pay that penalty. This is why you benefit form loop unrolling. However, the Epiphany has an undocumented (yet) feature for hardware loops. This was described in a forum post from a couple of weeks ago. Under certain limitations for those loops, the penalty is eliminated. If you can find that post, you can implement that method (in assembly, for now). On Sat, Jul 13, 2013 at 2:41 PM, Katja Malvoni <kmalvoni@...il.com> wrote: > Hi Yaniv, Alexander, > > My assembly code doesn't produce correct results but it has same > execution time (20.7 ms for all cores and cost 5) as the one that produces > correct result and it uses add instruction. When using fadd/iadd execution > time is 22.6 ms. > Except that, I don't see any other way to optimize BF_ROUND - C code is > very optimal, almost every line corresponds to one instruction (two that > don't are "tmp3 += BF_INDEX(ctx->s.S, tmp4);" and "R ^= ctx->s.P[N + > 1];"). > In case of higher cost, my code is much slower (2403.999 ms vs 2182.041 msfor cost 12 and 9603.889 ms vs 8716.686 > ms for cost 14) - I didn't take care about pipeline structure and hazards > and I don't think I'll be able to change order of instructions better than > compiler did it. > > Yaniv, Epiphany Architecture Reference, p.69: "The branch prediction > mechanism used by the CPU assumes that the branch was not taken. There is > no penalty for branches not taken. For branches that are taken, there is a > three-cycle constant penalty." - is this true for loops as well? If yes, in > case of BF_encrypt this means penalty of (3 cycles * 1042) * 2^cost. Can > that be avoided somehow? > > On Thu, Jun 27, 2013 at 5:08 PM, Solar Designer <solar@...nwall.com>wrote: > >> [...] >> >> We may try two things: >> >> 1. Interleave two instances of bcrypt. >> >> and/or >> >> 2. Rewrite this function in assembly. The compiler-generated code does >> look suboptimal. >> [...] >> > > I'll switch to other approach - interleaving two instances of bcrypt and > I'll try to use Dual-Issue Scheduling Rules. > > Best regards, > > Katja > -- =========================================================== Yaniv Sapir Adapteva Inc. 1666 Massachusetts Ave, Suite 14 Lexington, MA 02420 Phone: (781)-328-0513 (x104) Email: yaniv@...pteva.com Web: www.adapteva.com ============================================================ CONFIDENTIALITY NOTICE: This e-mail may contain information that is confidential and proprietary to Adapteva, and Adapteva hereby designates the information in this e-mail as confidential. The information is intended only for the use of the individual or entity named above. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in this transmission is strictly prohibited and that you should immediately destroy this e-mail and its contents and notify Adapteva. ============================================================== Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.