Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 31 Jul 2013 06:19:08 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Katja,

On Tue, Jul 30, 2013 at 10:57:09PM +0200, Katja Malvoni wrote:
> I implemeted preload of second P array, code is committed. I got one
> register by reusing one of temporaries and I got another one by changing
> two offsets for one ptr. I'm getting 1192 c/s. I expected higher speed and
> I think this is because something is not dual-issued for the second
> instance second BF_ROUND in macro. At the end of the macro, load from P
> array was ensuring 4 cycles separation between iadd and corresponding eor.
> I still haven't figured out what is not dual-issued and why.

It looks like your code terminates abruptly, with no return from the
function.  I am surprised it works at all.  Does it possibly hit another
function body, execute that, and then return using its epilogue? %-)

https://github.com/kmalvoni/JohnTheRipper/blob/master/src/parallella_e_bcrypt.s

Another thing I noticed is that you're not yet using LDRD to preload P's
(instead, you preload the elements one by one).  I think you should.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ