Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 31 Jul 2013 06:19:08 +0400
From: Solar Designer <>
Subject: Re: Parallella: bcrypt


On Tue, Jul 30, 2013 at 10:57:09PM +0200, Katja Malvoni wrote:
> I implemeted preload of second P array, code is committed. I got one
> register by reusing one of temporaries and I got another one by changing
> two offsets for one ptr. I'm getting 1192 c/s. I expected higher speed and
> I think this is because something is not dual-issued for the second
> instance second BF_ROUND in macro. At the end of the macro, load from P
> array was ensuring 4 cycles separation between iadd and corresponding eor.
> I still haven't figured out what is not dual-issued and why.

It looks like your code terminates abruptly, with no return from the
function.  I am surprised it works at all.  Does it possibly hit another
function body, execute that, and then return using its epilogue? %-)

Another thing I noticed is that you're not yet using LDRD to preload P's
(instead, you preload the elements one by one).  I think you should.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.