Date: Tue, 30 Jul 2013 14:18:04 +0200 From: Katja Malvoni <kmalvoni@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Hi Yaniv, Alexander, On Tue, Jul 30, 2013 at 1:45 AM, Yaniv Sapir <yaniv@...pteva.com> wrote: > On Mon, Jul 29, 2013 at 7:38 PM, Solar Designer <solar@...nwall.com>wrote: > >> On Tue, Jul 30, 2013 at 12:44:09AM +0200, Katja Malvoni wrote: >> > When I used r2 and r3 for L0 and R0, before preloading P array, speed >> was >> > 1083 c/s. But when I changed to r48 and r56 it became 1136 c/s. I guess >> > it's because with r2 and r3 some instructions were 16-bit. >> >> That's puzzling. Yaniv - is dual-issue (or something else) hampered by >> having some 16-bit instructions inter-mixed with 32-bit ones? >> >> No, to the best of my understanding. But I will verify that tomorrow. > What I was thinking, though, is that there is some register dependency > introduced by the new regs. Is this possible? > It's not possible. In both cases both pairs of registers (r2, r3 and r48, r56) are used only as L0 and R0. When I switched to r48 and r56 only thing I did was find & replace. On Tue, Jul 30, 2013 at 2:16 AM, Solar Designer <solar@...nwall.com> wrote: > On Tue, Jul 30, 2013 at 01:59:10AM +0200, Katja Malvoni wrote: > > If I preload P array than I don't need pointer to P array so that's 34. > > Perhaps you meant 30, not 34? > I was counting "free" registers, in that case it's 34. If counting used registers than it's 30. On Tue, Jul 30, 2013 at 2:47 AM, Solar Designer <solar@...nwall.com> wrote: > Perhaps you should change your code to transferring just one struct? > I wouldn't be surprised if this gives us a few c/s extra. > Done. On Tue, Jul 30, 2013 at 2:51 AM, Solar Designer <solar@...nwall.com> wrote: > On Tue, Jul 30, 2013 at 04:16:54AM +0400, Solar Designer wrote: > > As an alternative, you could use two ptr's (but > > only one "end", since the loop iteration count is obviously and > > necessarily the same for both instances), right? So that would be one > > register (the second "ptr") instead of two (offsets), no? > > I (still) think this may work. Then you'd need to be updating two > ptr's, but you might be able to use a free IADD for that. > Unfortunately, I can't have it free. IADD doesn't support immediate constant so I have to put 8 in register to be able to do addition. Or do two additions with 4 but that won't be free. When I do test with BF_tst.in speed is 727 c/s. It seems that interleaving is not used (but even without interleaving it shouldn't be this slow). Self test on same code gives speed of 1175 or 1177 c/s. MAX_KEYS_PER_CRYPT is defined as EPIPHANY_CORES*2 so every crypt_all() call should compute 32 hashes? Katja Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.