Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 6 Jul 2014 12:15:28 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: ZedBoard: bcrypt

Hi Katja,

On Sat, Jul 05, 2014 at 08:36:07PM +0200, Katja Malvoni wrote:
> There was progress with bcrypt on ZedBoard but I haven't found time to
> report it earlier.

Thank you for posting this status update in here!

> At the moment, performance for cost 5 is 3754 c/s achieved by offloading
> more work on FPGA.

That's pretty good speed, comparable e.g. to Core 2 Quad CPUs, but at a
much lower power usage.

> Earlier, only the most costly loop was implemented in
> FPGA and performance was limited by computation done on host and
> communication overhead. Now, bcrypt hash is completely calculated on FPGA.
> However, performance is still limited by communication overhead because I
> transfer initial S-box values to the FPGA from host. I also avoid cmp_exact
> (it only returns 1 if computing bcrypt on FPGA). Next step is to avoid
> initial S-box transfers and store them in BRAM instead.

Right.  As discussed off-list, I think you need to store the initial
S-box contents separately in each core, in the currently unused portions
of BRAMs that you use for P-boxes and expanded keys.  That way,
initialization can proceed in parallel and locally to each core.

> On higher cost settings, performance is: 678.6 c/s for cost 8, 179.1
> c/s for cost 10 and 45.42 c/s for cost 12.

Thanks.  These translate to the following theoretical speeds for cost 5:

(2^8*1024+585)/(2^5*1024+585)*678.6 = 5345
(2^10*1024+585)/(2^5*1024+585)*179.1 = 5634
(2^12*1024+585)/(2^5*1024+585)*45.42 = 5713

There are 9+512 Blowfish encryptions before the costly loop, and 64 or
192 after it (let's use 64, since the remaining hash bits don't have to
be computed during attack most of the time).  That's 585 more.  For
cost 5, this increases the total from 32768 to 33353, or by 1.8%.

I guess you're computing 64 bits per hash only, correct?  This is
sufficiently unlikely to cause false positives that we can go with it.

> I tried measuring Zynq voltage during computation and it does drop so
> initial assumption is correct. Voltage drops from 960 mV to around 890 mV
> during computation when Zynq FPGA is fully utilized.

Is this on my ZedBoard or on yours?  Does the code even work on yours?
Even if not, can you nevertheless obtain a voltage reading from it, or
does it crash the board?  Same question about your Parallella board.

How many bcrypt instances and cores is this?  Still 2 cycles/round,
right?  What clock rate?

Thanks again,

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ