Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 Jul 2013 14:18:04 +0200
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Hi Yaniv, Alexander,

On Tue, Jul 30, 2013 at 1:45 AM, Yaniv Sapir <yaniv@...pteva.com> wrote:

> On Mon, Jul 29, 2013 at 7:38 PM, Solar Designer <solar@...nwall.com>wrote:
>
>> On Tue, Jul 30, 2013 at 12:44:09AM +0200, Katja Malvoni wrote:
>> > When I used r2 and r3 for L0 and R0, before preloading P array, speed
>> was
>> > 1083 c/s. But when I changed to r48 and r56 it became 1136 c/s. I guess
>> > it's because with r2 and r3 some instructions were 16-bit.
>>
>> That's puzzling.  Yaniv - is dual-issue (or something else) hampered by
>> having some 16-bit instructions inter-mixed with 32-bit ones?
>>
>> No, to the best of my understanding. But I will verify that tomorrow.
> What I was thinking, though, is that there is some register dependency
> introduced by the new regs. Is this possible?
>

It's not possible. In both cases both pairs of registers (r2, r3 and r48,
r56) are used only as L0 and R0. When I switched to r48 and r56 only thing
I did was find & replace.

On Tue, Jul 30, 2013 at 2:16 AM, Solar Designer <solar@...nwall.com> wrote:

> On Tue, Jul 30, 2013 at 01:59:10AM +0200, Katja Malvoni wrote:
> > If I preload P array than I don't need pointer to P array so that's 34.
>
> Perhaps you meant 30, not 34?
>

I was counting "free" registers, in that case it's 34. If counting used
registers than it's 30.

On Tue, Jul 30, 2013 at 2:47 AM, Solar Designer <solar@...nwall.com> wrote:

> Perhaps you should change your code to transferring just one struct?
> I wouldn't be surprised if this gives us a few c/s extra.
>

Done.

On Tue, Jul 30, 2013 at 2:51 AM, Solar Designer <solar@...nwall.com> wrote:

> On Tue, Jul 30, 2013 at 04:16:54AM +0400, Solar Designer wrote:
> > As an alternative, you could use two ptr's (but
> > only one "end", since the loop iteration count is obviously and
> > necessarily the same for both instances), right?  So that would be one
> > register (the second "ptr") instead of two (offsets), no?
>
> I (still) think this may work.  Then you'd need to be updating two
> ptr's, but you might be able to use a free IADD for that.
>

Unfortunately, I can't have it free. IADD doesn't support immediate
constant so I have to put 8 in register to be able to do addition. Or do
two additions with 4 but that won't be free.

When I do test with BF_tst.in speed is 727 c/s. It seems that interleaving
is not used (but even without interleaving it shouldn't be this slow). Self
test on same code gives speed of 1175 or 1177 c/s. MAX_KEYS_PER_CRYPT is
defined as EPIPHANY_CORES*2 so every crypt_all() call should compute 32
hashes?

Katja

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.