Date: Tue, 26 Mar 2019 11:21:40 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Cc: apingis@...nwall.net Subject: Re: bcrypt cracking on ZTEX 1.15y FPGA boards (bcrypt-ztex) Hi, After almost 2 years, we have a minor update to bcrypt-ztex: On Sun, Jun 25, 2017 at 07:07:52PM +0200, Solar Designer wrote: > Denis proceeded to work on bcrypt-ztex this year. We had listed this as > planned future work on Katja's project in 2014: > > http://www.openwall.com/presentations/Passwords14-Energy-Efficient-Cracking/ > > but unfortunately didn't resume that project until this year. I guess > better late than never, especially given that the results achieved are > still good even by modern standards (relative to current GPUs), despite > of those ZTEX 1.15y boards being rather old by now. As far as I can > tell, Denis' implementation is brand new, not building upon Katja's, > although our past experience was of some indirect help. Denis has now improved bcrypt-ztex making it slightly faster (18 rather than 19 cycles per Blowfish encryption, same clock rate, same number of cores), improving its hash comparator to be on par with other designs' (now up to 512 hashes per salt), and reducing its idle power consumption (through clock gating) to that of other recently revised *-ztex formats. There's revised documentation of the design here: https://github.com/magnumripper/JohnTheRipper/tree/bleeding-jumbo/src/ztex/fpga-bcrypt > The speed is roughly ~106k c/s at bcrypt cost 5 on ZTEX 1.15y without > overclocking, ~114k with overclocking. The corresponding new speeds, also measured from a QubesOS VM with USB traffic proxying via a sys-usb VM, are 111k c/s at the default 141 MHz (the design tools' reported frequency), and 120k c/s when "overclocked" to 152 MHz (also works stable on this board I tested). This is the expected 19/18 speedup over the previous revision. > It should scale almost linearly > with multiple boards (e.g. Denis reported ~103k c/s/board with 3 boards > on the same host). Tests on real hardware (no VM): One board (4 FPGAs), default clock rate: $ ./john -mask='?l?l?l?l' -format=bcrypt-ztex -verb=1 pw-fake-len4 SN 1: firmware uploaded SN 1: uploading bitstreams.. ok ZTEX 1 bus:2 dev:100 Frequency:141 141 141 141 Using default input encoding: UTF-8 Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:04 0g/s 0p/s 109605c/s 109605C/s aaaa..aaoa 74g 0:00:02:24 0.5122g/s 0p/s 111941c/s 111941C/s aaaa..aaoa 143g 0:00:04:46 30.77% (ETA: 11:02:10) 0.4995g/s 491.1p/s 111983c/s 111983C/s abcd..aaot 228g 0:00:06:42 76.92% (ETA: 10:55:23) 0.5665g/s 873.5p/s 111987c/s 111987C/s meow..aaov 239g 0:00:06:50 N/A 0.5829g/s 1114p/s 111714c/s 111714C/s alex..###q Session completed Four boards (16 FPGAs), default clock rate: $ ./john -mask='?l?l?l?l' -format=bcrypt-ztex -verb=1 pw-fake-len4 SN 2: firmware uploaded SN 4: firmware uploaded SN 1: firmware uploaded SN 3: firmware uploaded SN 3: uploading bitstreams.. ok SN 1: uploading bitstreams.. ok SN 4: uploading bitstreams.. ok SN 2: uploading bitstreams.. ok ZTEX 3 bus:2 dev:120 Frequency:141 141 141 141 ZTEX 1 bus:2 dev:121 Frequency:141 141 141 141 ZTEX 4 bus:2 dev:122 Frequency:141 141 141 141 ZTEX 2 bus:2 dev:119 Frequency:141 141 141 141 Using default input encoding: UTF-8 Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 7g 0:00:00:07 0.9114g/s 0p/s 439400c/s 439400C/s aaaa..aaba 197g 0:00:02:33 DONE (2019-03-26 10:56) 1.283g/s 1832p/s 441461c/s 441461C/s snow..aamk 239g 0:00:03:00 N/A 1.327g/s 2537p/s 417170c/s 417170C/s mark..###q Session completed Using speeds seen during cracking, the scaling efficiency is: 441461/111987/4 = 98.5% The final speeds are lower because the last batch of candidate passwords was too small to fully use the devices, especially four boards at once. This effect would be smaller for longer runs or with fewer salts. Also, if I didn't lower the verbosity (which I did to prevent being flooded with the many cracked passwords) we would have seen a warning about the under-full last batch not using the hardware optimally. We added such warnings recently. The running time reduction seen here is much less than 4x because the single-board run was lucky to crack all passwords way before reaching 100% of the keyspace, whereas the multi-board run presumably processed more of the keyspace. The progress indicator is off, though - perhaps because there were too few batches of candidate passwords given how many passwords are needed per batch to fully utilize the hardware. To exhaust the keyspace at these speeds, it'd have taken 16 and 4 minutes, but we're seeing 7 and 3 minutes, respectively. This happens. > I can't easily measure the power consumption right > now, but I estimate it's ~20W as both the board (with a large but slowly > rotating cooling fan) and the 12V, 5A power adapter (brick) stay barely > warm to the touch. These used to get much warmer in Bitcoin mining > tests (known to be ~40W). I underestimated. Denis wrote "Current consumption (12V input): 2.2A, idle 0.4A" in the documentation referenced above, which corresponds to 26W load, under 5W idle. My own measurements: hashes/second clock rate power active power idle 111k+ 141 MHz 32W < 5W 120k+ 152 MHz (o/c) 34W < 5W Power consumption corresponds to these exact tests, and is measured for 230V AC, so includes power adapter overhead (estimated 15% to 20% of total). > this board is 10% overvolted (extra resistors soldered on by the > previous owner) Not anymore. All of the above results (including the stable "overclock" to 152 MHz, 120k+ c/s) are at stock core voltage. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.