Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 26 Mar 2019 11:21:40 +0100
From: Solar Designer <>
Subject: Re: bcrypt cracking on ZTEX 1.15y FPGA boards (bcrypt-ztex)


After almost 2 years, we have a minor update to bcrypt-ztex:

On Sun, Jun 25, 2017 at 07:07:52PM +0200, Solar Designer wrote:
> Denis proceeded to work on bcrypt-ztex this year.  We had listed this as
> planned future work on Katja's project in 2014:
> but unfortunately didn't resume that project until this year.  I guess
> better late than never, especially given that the results achieved are
> still good even by modern standards (relative to current GPUs), despite
> of those ZTEX 1.15y boards being rather old by now.  As far as I can
> tell, Denis' implementation is brand new, not building upon Katja's,
> although our past experience was of some indirect help.

Denis has now improved bcrypt-ztex making it slightly faster (18 rather
than 19 cycles per Blowfish encryption, same clock rate, same number of
cores), improving its hash comparator to be on par with other designs'
(now up to 512 hashes per salt), and reducing its idle power consumption
(through clock gating) to that of other recently revised *-ztex formats.

There's revised documentation of the design here:

> The speed is roughly ~106k c/s at bcrypt cost 5 on ZTEX 1.15y without
> overclocking, ~114k with overclocking.

The corresponding new speeds, also measured from a QubesOS VM with USB
traffic proxying via a sys-usb VM, are 111k c/s at the default 141 MHz
(the design tools' reported frequency), and 120k c/s when "overclocked"
to 152 MHz (also works stable on this board I tested).  This is the
expected 19/18 speedup over the previous revision.

> It should scale almost linearly
> with multiple boards (e.g. Denis reported ~103k c/s/board with 3 boards
> on the same host).

Tests on real hardware (no VM):

One board (4 FPGAs), default clock rate:

$ ./john -mask='?l?l?l?l' -format=bcrypt-ztex -verb=1 pw-fake-len4 
SN 1: firmware uploaded
SN 1: uploading bitstreams.. ok
ZTEX 1 bus:2 dev:100 Frequency:141 141 141 141 
Using default input encoding: UTF-8
Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:04  0g/s 0p/s 109605c/s 109605C/s aaaa..aaoa
74g 0:00:02:24  0.5122g/s 0p/s 111941c/s 111941C/s aaaa..aaoa
143g 0:00:04:46 30.77% (ETA: 11:02:10) 0.4995g/s 491.1p/s 111983c/s 111983C/s abcd..aaot
228g 0:00:06:42 76.92% (ETA: 10:55:23) 0.5665g/s 873.5p/s 111987c/s 111987C/s meow..aaov
239g 0:00:06:50 N/A 0.5829g/s 1114p/s 111714c/s 111714C/s alex..###q
Session completed

Four boards (16 FPGAs), default clock rate:

$ ./john -mask='?l?l?l?l' -format=bcrypt-ztex -verb=1 pw-fake-len4 
SN 2: firmware uploaded
SN 4: firmware uploaded
SN 1: firmware uploaded
SN 3: firmware uploaded
SN 3: uploading bitstreams.. ok
SN 1: uploading bitstreams.. ok
SN 4: uploading bitstreams.. ok
SN 2: uploading bitstreams.. ok
ZTEX 3 bus:2 dev:120 Frequency:141 141 141 141 
ZTEX 1 bus:2 dev:121 Frequency:141 141 141 141 
ZTEX 4 bus:2 dev:122 Frequency:141 141 141 141 
ZTEX 2 bus:2 dev:119 Frequency:141 141 141 141 
Using default input encoding: UTF-8
Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
7g 0:00:00:07  0.9114g/s 0p/s 439400c/s 439400C/s aaaa..aaba
197g 0:00:02:33 DONE (2019-03-26 10:56) 1.283g/s 1832p/s 441461c/s 441461C/s snow..aamk
239g 0:00:03:00 N/A 1.327g/s 2537p/s 417170c/s 417170C/s mark..###q
Session completed

Using speeds seen during cracking, the scaling efficiency is:

441461/111987/4 = 98.5%

The final speeds are lower because the last batch of candidate passwords
was too small to fully use the devices, especially four boards at once.
This effect would be smaller for longer runs or with fewer salts.  Also,
if I didn't lower the verbosity (which I did to prevent being flooded
with the many cracked passwords) we would have seen a warning about the
under-full last batch not using the hardware optimally.  We added such
warnings recently.

The running time reduction seen here is much less than 4x because the
single-board run was lucky to crack all passwords way before reaching
100% of the keyspace, whereas the multi-board run presumably processed
more of the keyspace.  The progress indicator is off, though - perhaps
because there were too few batches of candidate passwords given how many
passwords are needed per batch to fully utilize the hardware.  To
exhaust the keyspace at these speeds, it'd have taken 16 and 4 minutes,
but we're seeing 7 and 3 minutes, respectively.  This happens.

> I can't easily measure the power consumption right
> now, but I estimate it's ~20W as both the board (with a large but slowly
> rotating cooling fan) and the 12V, 5A power adapter (brick) stay barely
> warm to the touch.  These used to get much warmer in Bitcoin mining
> tests (known to be ~40W).

I underestimated.  Denis wrote "Current consumption (12V input): 2.2A,
idle 0.4A" in the documentation referenced above, which corresponds to
26W load, under 5W idle.  My own measurements:

hashes/second	clock rate	power active	power idle
111k+		141 MHz		32W		< 5W
120k+		152 MHz (o/c)	34W		< 5W

Power consumption corresponds to these exact tests, and is measured for
230V AC, so includes power adapter overhead (estimated 15% to 20% of total).

> this board is 10% overvolted (extra resistors soldered on by the
> previous owner)

Not anymore.  All of the above results (including the stable "overclock"
to 152 MHz, 120k+ c/s) are at stock core voltage.


Powered by blists - more mailing lists

Your e-mail address:

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.