Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 20 Jul 2014 05:12:10 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: ZedBoard: bcrypt

Katja,

On Sat, Jul 19, 2014 at 07:10:09PM +0200, Katja Malvoni wrote:
> On 17 July 2014 00:03, Solar Designer <solar@...nwall.com> wrote:
> 
> > > I'll implement 56 instances with 4 BRAMs per core and see if these will
> > > perform as expected.
> >
> > Yes, please.
> 
> Implemented. 4571 c/s for cost 5, 64.51 c/s for cost 12.

Cool!

What clock rate?

As I noted in http://www.openwall.com/lists/john-dev/2014/04/21/9
"computation might be slightly slower: it's add, xor, add done         
sequentially on the same cycle" - but I guess this didn't affect your
clock rate yet since the clock rate was limited by longest path used
during initialization anyway?

> While testing with
> cost 12, the zed system rebooted. I guess it's overheating since you modded
> the board so it shouldn't be voltage drop problem.

Yes, as I mentioned to you via private e-mail, in the current plastic
box the board overheats when Zynq PL is in full use for more than a
couple of minutes.  I'll look into adding a fan (perhaps 40mm, like old
graphics cards had).

> Next step is to make this 8 BRAMs per core and to avoid initial S-box
> transfers :-)

Please describe the exact layout you intend for 8 BRAMs.

Right now, you use 5 BRAMs per core, with 2 bcrypt instances per core,
correct?  Out of these, 4 BRAMs are half-empty, and 1 BRAM is mostly
empty but not empty enough to put the entire initial S-box values in
there.

If you combine two such cores together (10 BRAMs, 4 instances), you'll
have two mostly empty BRAMs per core, and you'll be able to fit the
initial S-box values in there - and still be able to proceed further to
double the number of instances per core to hide BRAM latencies.

Alternatively, you may fit the initial S-box values in the currently
unused halves of the 4 S-box BRAMs (then they'll be 3/4 full), while
staying at 5 BRAMs and 2 bcrypt instances per core.  (Or you may spread
the initial values across the 5 BRAMs differently, to also use the 5th
BRAM's ports for quicker initialization or/and to have fewer MUXes for
the S-box BRAMs if that turns out to be the case for such design.)
Then you won't be able to proceed to double the number of instances per
core without re-designing the initialization, but on the other hand with
smaller cores like this routing delays might be smaller.

The performance difference between these may come from initialization
time (for low bcrypt cost) and clock rate (for any bcrypt cost).

Neither of these is 8 BRAMs/core.  So what is your plan?

Thanks,

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ