Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 Jul 2013 00:44:09 +0200
From: Katja Malvoni <>
Subject: Re: Parallella: bcrypt


I moved to separate assembly file and my code from yesterday worked. I
implemented whole BF_encrypt2() in assembly.
There are no enough registers to preload both P arrays so I'm preloading
only one. Speed is 1175 c/s.
Code is in

I left two rounds in one macro - since there needs to be 4 cycles between
ALU following FPU instruction to have dual issue, with only one round it's
not possible to have shift right by 22 on ALU and FPU for both instances
and to use iadd at the end. With two rounds in one macro one shift right by
22 and one add are not parallelised for one macro.

When I used r2 and r3 for L0 and R0, before preloading P array, speed was
1083 c/s. But when I changed to r48 and r56 it became 1136 c/s. I guess
it's because with r2 and r3 some instructions were 16-bit.


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.