john-dev - Re: ZedBoard: bcrypt

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+EaD-ZeMtkgLsv27-BP4d1YbqSucwAQ3psreu7HoDJ85_58Og@mail.gmail.com>
Date: Tue, 5 Nov 2013 17:21:10 +0100
From: Katja Malvoni <kmalvoni@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: ZedBoard: bcrypt

Hi Alexander,

On Sun, Nov 3, 2013 at 11:02 PM, Solar Designer <solar@...nwall.com> wrote:

> Great!  I think your next step is to implement two instances of bcrypt
> per core, so that there are no wait-only cycles.  That is, in Cycle 1
> above you would be doing the same kind of work as on Cycle 0, but for
> the other instance.  You may use the currently wasted halves of the same
> RAM blocks (just set the most significant address bit when doing the
> memory accesses for the second bcrypt instance) or you may use separate
> RAM blocks - whichever results in lower utilization of other resources.
>

I have implementation which works in simulation but not on the board.
However, utilization is:

Register: 5%
LUT: 41%
Slice: 66%
RAMB36E1: 6%
RAMB18E1: 1%
BUFG: 3%
With these numbers there is no point in trying to find bug(s).
I'll try to redesign current implementation.

> performance on self test for one core is 79 c/s while
> > for 14 cores it's 765 c/s. For cost 12 these numbers are 0.6656c/s for 1
> > core and 8.002c/s for 14 cores. Overhead of loading data from shared BRAM
> > into per core BRAMs is significant.
>
> I think it's not only the overhead of loading data, but also the
> overhead of host-side computation, which is not currently overlapped
> with computation on the FPGA.  Remember that you only implemented
> bcrypt's variable-cost loop on the FPGA, keeping some fixed-cost
> Blowfish stuff before and after this loop on the host CPU.  Although
> JtR's format interface currently requires that everything is in sync by
> the time crypt_all() returns (no precomputation for next set of
> candidate passwords possible at this point), you may nevertheless
> overlap host and FPGA computation most of the time by making
> max_keys_per_crypt several times higher and overlapping things inside of
> crypt_all(), except for the very last subset of candidate passwords.
>
> I suggest that you make this max_keys_per_crypt increase factor
> configurable - at least at compile-time, or it can even be chosen at
> runtime since the format's init() may modify max_keys_per_crypt.
>
> For example, with 14 cores and two bcrypt instances per core, you'd have
> min_keys_per_crypt at 28, but you may have max_keys_per_crypt at higher
> multiples of 28 - e.g., 112.  With that, you'd be able to overlap host
> and FPGA computation 3/4th of the time.
>
> Is the above explanation clear?  Please feel free to ask any questions
> you might have.
>

I think I understand, I might have additional questions once I start
implementing it.

Katja

Content of type "text/html" skipped

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.