Date: Wed, 31 Jul 2013 06:19:08 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Katja, On Tue, Jul 30, 2013 at 10:57:09PM +0200, Katja Malvoni wrote: > I implemeted preload of second P array, code is committed. I got one > register by reusing one of temporaries and I got another one by changing > two offsets for one ptr. I'm getting 1192 c/s. I expected higher speed and > I think this is because something is not dual-issued for the second > instance second BF_ROUND in macro. At the end of the macro, load from P > array was ensuring 4 cycles separation between iadd and corresponding eor. > I still haven't figured out what is not dual-issued and why. It looks like your code terminates abruptly, with no return from the function. I am surprised it works at all. Does it possibly hit another function body, execute that, and then return using its epilogue? %-) https://github.com/kmalvoni/JohnTheRipper/blob/master/src/parallella_e_bcrypt.s Another thing I noticed is that you're not yet using LDRD to preload P's (instead, you preload the elements one by one). I think you should. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.