Date: Tue, 30 Jul 2013 00:44:09 +0200 From: Katja Malvoni <kmalvoni@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Parallella: bcrypt Hello, I moved to separate assembly file and my code from yesterday worked. I implemented whole BF_encrypt2() in assembly. There are no enough registers to preload both P arrays so I'm preloading only one. Speed is 1175 c/s. Code is in https://github.com/kmalvoni/JohnTheRipper/tree/master I left two rounds in one macro - since there needs to be 4 cycles between ALU following FPU instruction to have dual issue, with only one round it's not possible to have shift right by 22 on ALU and FPU for both instances and to use iadd at the end. With two rounds in one macro one shift right by 22 and one add are not parallelised for one macro. When I used r2 and r3 for L0 and R0, before preloading P array, speed was 1083 c/s. But when I changed to r48 and r56 it became 1136 c/s. I guess it's because with r2 and r3 some instructions were 16-bit. Katja Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.