Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 21 Jul 2013 15:40:30 +0200
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Hi Alexander,

Since Epiphany code became much smaller when I integrated it with JtR, I
tried using internal.ldf instead of fast.ldf and code fits in local memory.
Speed is 932 c/s (sometimes it's 934 c/s). BF_fmt port was slow because I
didn't copy salt in local memory, when I did that speed was 790 c/s. Than I
used internal.ldf and got speed of 932 c/s. If I try to interleave two
instances of bcrypt than code can't fit in local memory. At the moment,
interleaving two instances of bcrypt doesn't work, it fails self test on
get_hash[0](1). Should I pursue this approach further or not?

On Sun, Jul 21, 2013 at 2:25 AM, Solar Designer <solar@...nwall.com> wrote:

> Yes, I also think that copying within local memory is faster.  However,
> we may want to optimize the memcpy().  The existing implementation is
> too generic - it doesn't use the 64-bit dual-register load/store
> instructions, it has little unrolling, and it include support for sizes
> that are not a multiple of 4.  (This is from a quick glance at
> "e-objdump -d parallella_e_bcrypt.elf".)  Can you try creating a simpler
> specialized implementation instead, which would use the ldrd/strd insns
> and greater unrolling (e.g., 32 ldrd/strd pairs times 16 loop iterations,
> for a total of 512 x 64-bit data copies, or 4096 bytes total)?  Also,
> re-order the instructions such that the store is not attempted
> immediately after its corresponding load, to hide its latency - e.g.:
> [...]
>

Ok, I'll try this.

Katja

[ CONTENT OF TYPE text/html SKIPPED ]

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ