Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 Jul 2013 07:44:50 +0400
From: Solar Designer <>
Subject: Re: Parallella: bcrypt


On Tue, Jul 30, 2013 at 12:44:09AM +0200, Katja Malvoni wrote:
> I moved to separate assembly file and my code from yesterday worked. I
> implemented whole BF_encrypt2() in assembly.
> There are no enough registers to preload both P arrays so I'm preloading
> only one. Speed is 1175 c/s.
> Code is in

Here are some more comments:

I think you may/should use STRD and LDRD (double-register) instructions
to save/load registers on/from the stack, and to preload P.  You'll need
twice fewer instructions (and cycles?) for that, then.  Of course, this
is out of the loop, so not a big deal.

The four MOVs near the end of each loop can probably be avoided by
having the last Blowfish round write into different registers right
away.  Sure, we do have an equivalent of these MOVs in C source and we
have the MOVs in x86 asm code for bcrypt, but on archs with 3-operand
instructions I think they're avoidable.  Of course, this cost is only
incurred once per the 16 rounds of Blowfish, so it's not a big deal.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.