Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 21 Jul 2014 10:30:32 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: ZedBoard: bcrypt

Katja,

On Sun, Jul 20, 2014 at 04:10:01PM +0200, Katja Malvoni wrote:
> I'll go for 5 BRAMs/instance with storing initial S-box values across
> unused halves of 4 BRAMs holding S-boxes. This way, initialization will
> require 256 clock cycles.

Did you mean 512 clock cycles?

When you're using only 4 (out of 5) BRAMs for initial S-box values, and
those BRAMs also contain the actual S-boxes, you're limited to a total
of 8 BRAM accesses per cycle.  This can be 4 reads and 4 writes.  If so,
with two bcrypt instances to initialize and doing 4 writes per cycle,
you need 512 cycles to write them all (a total of 2048 32-bit values).
Some other split can help: since the initial S-box contents are the
same for the two instances, using an equal number of reads and writes
per cycle isn't optimal in terms of clock cycles needed (but might be
optimal in terms of simplicity and resource utilization).

> I'm storing 2 S-boxes in higher half of each of 4
> BRAMs. Initialization data is stored twice but I can copy it in parallel
> for both instances. I don't use additional BRAMs and although utilization
> will be higher, it won't impact max core count (wider buses were used in
> 112 instances approach and core count was limited by available BRAM).

I think you can copy in parallel for both instances while reading a
shared copy.  No need to store (and read) the initial values twice per
BRAM for that.

Anyway, 512 clock cycles is low enough for our current experiments.
It's only 1.5% of total computation time for cost 5.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ