Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 Jul 2013 14:18:04 +0200
From: Katja Malvoni <>
Subject: Re: Parallella: bcrypt

Hi Yaniv, Alexander,

On Tue, Jul 30, 2013 at 1:45 AM, Yaniv Sapir <> wrote:

> On Mon, Jul 29, 2013 at 7:38 PM, Solar Designer <>wrote:
>> On Tue, Jul 30, 2013 at 12:44:09AM +0200, Katja Malvoni wrote:
>> > When I used r2 and r3 for L0 and R0, before preloading P array, speed
>> was
>> > 1083 c/s. But when I changed to r48 and r56 it became 1136 c/s. I guess
>> > it's because with r2 and r3 some instructions were 16-bit.
>> That's puzzling.  Yaniv - is dual-issue (or something else) hampered by
>> having some 16-bit instructions inter-mixed with 32-bit ones?
>> No, to the best of my understanding. But I will verify that tomorrow.
> What I was thinking, though, is that there is some register dependency
> introduced by the new regs. Is this possible?

It's not possible. In both cases both pairs of registers (r2, r3 and r48,
r56) are used only as L0 and R0. When I switched to r48 and r56 only thing
I did was find & replace.

On Tue, Jul 30, 2013 at 2:16 AM, Solar Designer <> wrote:

> On Tue, Jul 30, 2013 at 01:59:10AM +0200, Katja Malvoni wrote:
> > If I preload P array than I don't need pointer to P array so that's 34.
> Perhaps you meant 30, not 34?

I was counting "free" registers, in that case it's 34. If counting used
registers than it's 30.

On Tue, Jul 30, 2013 at 2:47 AM, Solar Designer <> wrote:

> Perhaps you should change your code to transferring just one struct?
> I wouldn't be surprised if this gives us a few c/s extra.


On Tue, Jul 30, 2013 at 2:51 AM, Solar Designer <> wrote:

> On Tue, Jul 30, 2013 at 04:16:54AM +0400, Solar Designer wrote:
> > As an alternative, you could use two ptr's (but
> > only one "end", since the loop iteration count is obviously and
> > necessarily the same for both instances), right?  So that would be one
> > register (the second "ptr") instead of two (offsets), no?
> I (still) think this may work.  Then you'd need to be updating two
> ptr's, but you might be able to use a free IADD for that.

Unfortunately, I can't have it free. IADD doesn't support immediate
constant so I have to put 8 in register to be able to do addition. Or do
two additions with 4 but that won't be free.

When I do test with speed is 727 c/s. It seems that interleaving
is not used (but even without interleaving it shouldn't be this slow). Self
test on same code gives speed of 1175 or 1177 c/s. MAX_KEYS_PER_CRYPT is
defined as EPIPHANY_CORES*2 so every crypt_all() call should compute 32


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.