Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 20 Jul 2014 16:10:01 +0200
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: ZedBoard: bcrypt

ᐧ
On 20 July 2014 03:12, Solar Designer <solar@...nwall.com> wrote:

> On Sat, Jul 19, 2014 at 07:10:09PM +0200, Katja Malvoni wrote:
> > On 17 July 2014 00:03, Solar Designer <solar@...nwall.com> wrote:
> >
> > > > I'll implement 56 instances with 4 BRAMs per core and see if these
> will
> > > > perform as expected.
> > >
> > > Yes, please.
> >
> > Implemented. 4571 c/s for cost 5, 64.51 c/s for cost 12.
>
> Cool!
>
> What clock rate?
>

71 MHz as it was before (actually, it's a bit more: 71.4). Max frequency
reported by Xilinx tool is 74.2 MHz but first higher than 71 MHz I can use
is 76.9 MHz. It might work but I haven't tried it.


> As I noted in http://www.openwall.com/lists/john-dev/2014/04/21/9
> "computation might be slightly slower: it's add, xor, add done
> sequentially on the same cycle" - but I guess this didn't affect your
> clock rate yet since the clock rate was limited by longest path used
> during initialization anyway?
>

Yes, that's right.


> > Next step is to make this 8 BRAMs per core and to avoid initial S-box
> > transfers :-)
>
> Please describe the exact layout you intend for 8 BRAMs.
>

Sorry, I meant 10 BRAMs.


> Right now, you use 5 BRAMs per core, with 2 bcrypt instances per core,
> correct?  Out of these, 4 BRAMs are half-empty, and 1 BRAM is mostly
> empty but not empty enough to put the entire initial S-box values in
> there.
>
> If you combine two such cores together (10 BRAMs, 4 instances), you'll
> have two mostly empty BRAMs per core, and you'll be able to fit the
> initial S-box values in there - and still be able to proceed further to
> double the number of instances per core to hide BRAM latencies.
>
> Alternatively, you may fit the initial S-box values in the currently
> unused halves of the 4 S-box BRAMs (then they'll be 3/4 full), while
> staying at 5 BRAMs and 2 bcrypt instances per core.  (Or you may spread
> the initial values across the 5 BRAMs differently, to also use the 5th
> BRAM's ports for quicker initialization or/and to have fewer MUXes for
> the S-box BRAMs if that turns out to be the case for such design.)
> Then you won't be able to proceed to double the number of instances per
> core without re-designing the initialization, but on the other hand with
> smaller cores like this routing delays might be smaller.
>
> The performance difference between these may come from initialization
> time (for low bcrypt cost) and clock rate (for any bcrypt cost).
>
> Neither of these is 8 BRAMs/core.  So what is your plan?


I'll go for 5 BRAMs/instance with storing initial S-box values across
unused halves of 4 BRAMs holding S-boxes. This way, initialization will
require 256 clock cycles. I'm storing 2 S-boxes in higher half of each of 4
BRAMs. Initialization data is stored twice but I can copy it in parallel
for both instances. I don't use additional BRAMs and although utilization
will be higher, it won't impact max core count (wider buses were used in
112 instances approach and core count was limited by available BRAM).

Katja

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.