Date: Tue, 5 Nov 2013 17:21:10 +0100 From: Katja Malvoni <kmalvoni@...il.com> To: john-dev@...ts.openwall.com Subject: Re: ZedBoard: bcrypt Hi Alexander, On Sun, Nov 3, 2013 at 11:02 PM, Solar Designer <solar@...nwall.com> wrote: > Great! I think your next step is to implement two instances of bcrypt > per core, so that there are no wait-only cycles. That is, in Cycle 1 > above you would be doing the same kind of work as on Cycle 0, but for > the other instance. You may use the currently wasted halves of the same > RAM blocks (just set the most significant address bit when doing the > memory accesses for the second bcrypt instance) or you may use separate > RAM blocks - whichever results in lower utilization of other resources. > I have implementation which works in simulation but not on the board. However, utilization is: Register: 5% LUT: 41% Slice: 66% RAMB36E1: 6% RAMB18E1: 1% BUFG: 3% With these numbers there is no point in trying to find bug(s). I'll try to redesign current implementation. > performance on self test for one core is 79 c/s while > > for 14 cores it's 765 c/s. For cost 12 these numbers are 0.6656c/s for 1 > > core and 8.002c/s for 14 cores. Overhead of loading data from shared BRAM > > into per core BRAMs is significant. > > I think it's not only the overhead of loading data, but also the > overhead of host-side computation, which is not currently overlapped > with computation on the FPGA. Remember that you only implemented > bcrypt's variable-cost loop on the FPGA, keeping some fixed-cost > Blowfish stuff before and after this loop on the host CPU. Although > JtR's format interface currently requires that everything is in sync by > the time crypt_all() returns (no precomputation for next set of > candidate passwords possible at this point), you may nevertheless > overlap host and FPGA computation most of the time by making > max_keys_per_crypt several times higher and overlapping things inside of > crypt_all(), except for the very last subset of candidate passwords. > > I suggest that you make this max_keys_per_crypt increase factor > configurable - at least at compile-time, or it can even be chosen at > runtime since the format's init() may modify max_keys_per_crypt. > > For example, with 14 cores and two bcrypt instances per core, you'd have > min_keys_per_crypt at 28, but you may have max_keys_per_crypt at higher > multiples of 28 - e.g., 112. With that, you'd be able to overlap host > and FPGA computation 3/4th of the time. > > Is the above explanation clear? Please feel free to ask any questions > you might have. > I think I understand, I might have additional questions once I start implementing it. Katja Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.