|
Message-ID: <20130909211249.GA20767@openwall.com> Date: Tue, 10 Sep 2013 01:12:49 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: ZedBoard / Parallella: bcrypt Katja, On Mon, Sep 09, 2013 at 09:48:33PM +0200, Katja Malvoni wrote: > I found out bug in verilog code and fixed it. Now it produces correct > result for all test vectors. > > At the moment I'm using usleep(78125) instead of done signal Thanks for these status updates! So that's 12.8 c/s per bcrypt core for now. Given the current FPGA utilization with and without this one bcrypt core added, roughly how many such bcrypt cores would fit? What clock rate are you using? What's the maximum clock rate for this design on this device, as reported by ISE? Can you easily increase the clock rate to be closer to that maximum? I understand that these questions may be premature, yet I'd like us to establish a performance baseline before proceeding with optimizations. 12.8 c/s roughly corresponds to the original Pentium at 75 MHz. Since your clock rate is probably comparable to that, it's a fine speed to have for totally unoptimized code. We just need to make it a few times higher with optimizations, and use many cores. Theoretically, going from block RAM count, we can have up to 140 bcrypt cores in a Zynq 7020. If other FPGA resources permit us to have this many cores (or at least more than 70), we need to keep four S-boxes per block RAM, which means that with two ports we'll be able to do two S-box lookups per core per cycle. Ideally, we'd have the data available the next cycle, but even if the latency is somehow 2 cycles (such as because we chose to enable registers in order to achieve a higher clock rate), we can nevertheless proceed with the next two lookups right away half of the time. This is because at the start of a Blowfish round, we have indices for the four S-boxes, but we have to wait for all four lookup results to be available before we can proceed with the next round. On the other hand, if other resource utilization is such that we can't reasonably hope to achieve 70+ cores, then we may use 2+ separate block RAMs per core in order to increase the port count, and thus the number of S-box lookups made at once. Another aspect is the number of bcrypt instances per core, which is both an input to and an outcome of our decision-making on better use of block RAMs and on the number of cores. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.