Date: Sun, 6 Jul 2014 12:15:28 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: ZedBoard: bcrypt Hi Katja, On Sat, Jul 05, 2014 at 08:36:07PM +0200, Katja Malvoni wrote: > There was progress with bcrypt on ZedBoard but I haven't found time to > report it earlier. Thank you for posting this status update in here! > At the moment, performance for cost 5 is 3754 c/s achieved by offloading > more work on FPGA. That's pretty good speed, comparable e.g. to Core 2 Quad CPUs, but at a much lower power usage. > Earlier, only the most costly loop was implemented in > FPGA and performance was limited by computation done on host and > communication overhead. Now, bcrypt hash is completely calculated on FPGA. > However, performance is still limited by communication overhead because I > transfer initial S-box values to the FPGA from host. I also avoid cmp_exact > (it only returns 1 if computing bcrypt on FPGA). Next step is to avoid > initial S-box transfers and store them in BRAM instead. Right. As discussed off-list, I think you need to store the initial S-box contents separately in each core, in the currently unused portions of BRAMs that you use for P-boxes and expanded keys. That way, initialization can proceed in parallel and locally to each core. > On higher cost settings, performance is: 678.6 c/s for cost 8, 179.1 > c/s for cost 10 and 45.42 c/s for cost 12. Thanks. These translate to the following theoretical speeds for cost 5: (2^8*1024+585)/(2^5*1024+585)*678.6 = 5345 (2^10*1024+585)/(2^5*1024+585)*179.1 = 5634 (2^12*1024+585)/(2^5*1024+585)*45.42 = 5713 There are 9+512 Blowfish encryptions before the costly loop, and 64 or 192 after it (let's use 64, since the remaining hash bits don't have to be computed during attack most of the time). That's 585 more. For cost 5, this increases the total from 32768 to 33353, or by 1.8%. I guess you're computing 64 bits per hash only, correct? This is sufficiently unlikely to cause false positives that we can go with it. > I tried measuring Zynq voltage during computation and it does drop so > initial assumption is correct. Voltage drops from 960 mV to around 890 mV > during computation when Zynq FPGA is fully utilized. Is this on my ZedBoard or on yours? Does the code even work on yours? Even if not, can you nevertheless obtain a voltage reading from it, or does it crash the board? Same question about your Parallella board. How many bcrypt instances and cores is this? Still 2 cycles/round, right? What clock rate? Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.