Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 29 Jul 2013 19:45:52 -0400
From: Yaniv Sapir <yaniv@...pteva.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

On Mon, Jul 29, 2013 at 7:38 PM, Solar Designer <solar@...nwall.com> wrote:

> Katja, Yaniv -
>
> On Tue, Jul 30, 2013 at 12:44:09AM +0200, Katja Malvoni wrote:
> > I moved to separate assembly file and my code from yesterday worked. I
> > implemented whole BF_encrypt2() in assembly.
> > There are no enough registers to preload both P arrays so I'm preloading
> > only one.
>
> How is that - not enough registers to preload both P arrays?  We got 64
> registers and little demand for them other than for the two P's (need 36
> for them).
>



I was thinking the same thing. If you run out of regs with 64 of them
available, then you should start reusing registers once their values are
not required anymore.




> > Speed is 1175 c/s.
>
> Good, but it should be 1200 c/s with both P'c preloaded. ;-)
>
> > Code is in https://github.com/kmalvoni/JohnTheRipper/tree/master
> >
> > I left two rounds in one macro - since there needs to be 4 cycles between
> > ALU following FPU instruction to have dual issue, with only one round
> it's
> > not possible to have shift right by 22 on ALU and FPU for both instances
> > and to use iadd at the end. With two rounds in one macro one shift right
> by
> > 22 and one add are not parallelised for one macro.
>
> I find the above description a bit confusing, but I understand the
> general issue.  OK.
>
> Have you tried replacing the right shift by 22 followed by AND with
> right shift by 24 followed by IMUL?  (AND is non-free, whereas IMUL is
> potentially free.)
>
> > When I used r2 and r3 for L0 and R0, before preloading P array, speed was
> > 1083 c/s. But when I changed to r48 and r56 it became 1136 c/s. I guess
> > it's because with r2 and r3 some instructions were 16-bit.
>
> That's puzzling.  Yaniv - is dual-issue (or something else) hampered by
> having some 16-bit instructions inter-mixed with 32-bit ones?
>
>

No, to the best of my understanding. But I will verify that tomorrow. What
I was thinking, though, is that there is some register dependency
introduced by the new regs. Is this possible?

Yaniv.

Content of type "text/html" skipped

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ