Date: Tue, 30 Jul 2013 07:44:50 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Katja, On Tue, Jul 30, 2013 at 12:44:09AM +0200, Katja Malvoni wrote: > I moved to separate assembly file and my code from yesterday worked. I > implemented whole BF_encrypt2() in assembly. > There are no enough registers to preload both P arrays so I'm preloading > only one. Speed is 1175 c/s. > Code is in https://github.com/kmalvoni/JohnTheRipper/tree/master Here are some more comments: I think you may/should use STRD and LDRD (double-register) instructions to save/load registers on/from the stack, and to preload P. You'll need twice fewer instructions (and cycles?) for that, then. Of course, this is out of the loop, so not a big deal. The four MOVs near the end of each loop can probably be avoided by having the last Blowfish round write into different registers right away. Sure, we do have an equivalent of these MOVs in C source and we have the MOVs in x86 asm code for bcrypt, but on archs with 3-operand instructions I think they're avoidable. Of course, this cost is only incurred once per the 16 rounds of Blowfish, so it's not a big deal. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.