Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 Jul 2013 04:16:54 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Katja, Yaniv -

On Tue, Jul 30, 2013 at 01:59:10AM +0200, Katja Malvoni wrote:
> On Tue, Jul 30, 2013 at 1:38 AM, Solar Designer <solar@...nwall.com> wrote:
> > How is that - not enough registers to preload both P arrays?  We got 64
> > registers and little demand for them other than for the two P's (need 36
> > for them).
> 
> 8 for tmp1-4 for both instances,

Possibly you could get away with fewer than 8 by reusing some of them.
The code might be more readable with 8 separate temporaries, though.

> 10 for pointers (P, S[0], S[1], S[2], S[3]),

Oh, we pay a price for Epiphany lacking addressing modes with both index
and constant displacement at once.  And you explained you can't move the
addition to IMADD because that instruction would overwrite the register.

> 2 for ptr and end for controlling the loop, 2 as offset between
> first and second BF_ctx,

Why two registers for that offset, isn't it just one offset?  And why do
you need the offset?  As an alternative, you could use two ptr's (but
only one "end", since the loop iteration count is obviously and
necessarily the same for both instances), right?  So that would be one
register (the second "ptr") instead of two (offsets), no?

> 3 for constants (0xff, 0x3cf and 4 for imul), 4
> for R0, L0, R1, L1, 2 as function arguments which gives 31. There are 33
> left, I can't use stack pointer so that's 32.

OK.

> If I preload P array than I don't need pointer to P array so that's 34.

Perhaps you meant 30, not 34?

> What about r28-r31? I thought I can't use those.

Yaniv - can you answer this?

> If I can than there must be a way to find two "missing" registers.

Even if you preload a portion of the second P array, that could be
beneficial.  You would still have a few load instructions for some P
elements in the loop, but there would be fewer of those loads than with
nothing preloaded.

> > Have you tried replacing the right shift by 22 followed by AND with
> > right shift by 24 followed by IMUL?  (AND is non-free, whereas IMUL is
> > potentially free.)
> 
> I am doing that 3 out of 4 times in one macro.

Oh, you're right.  You write 0x18 where I expected to see 24. ;-)

Thanks,

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ