Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 3 Jul 2017 07:12:12 -0800
From: Royce Williams <royce@...ho.org>
To: john-users@...ts.openwall.com
Cc: Denis Burykin <apingis@...nwall.net>
Subject: Re: bcrypt cracking on ZTEX 1.15y FPGA boards (bcrypt-ztex)

On Sun, Jun 25, 2017 at 9:07 AM, Solar Designer <solar@...nwall.com> wrote:

> We finally got the bcrypt-ztex format into bleeding-jumbo this week.

Pretty great work - thanks again to you and Denis and anyone else who
has been working on this.

> The speed is roughly ~106k c/s at bcrypt cost 5 on ZTEX 1.15y without
> overclocking, ~114k with overclocking.  It should scale almost linearly
> with multiple boards (e.g. Denis reported ~103k c/s/board with 3 boards
> on the same host).  I can't easily measure the power consumption right
> now, but I estimate it's ~20W as both the board (with a large but slowly
> rotating cooling fan) and the 12V, 5A power adapter (brick) stay barely
> warm to the touch.  These used to get much warmer in Bitcoin mining
> tests (known to be ~40W).

Here are some tests on my cluster, as recently described here:

http://www.openwall.com/lists/john-users/2017/06/30/1

I discovered today that I had a USB power problem with two boards,
which I have fixed. (I had read that these boards require steady power
on the USB side, even though they are independently powered.) They are
still a little finicky, but I can usually coax them into working now.

I now have two more boards for a total of 16, so adjust any
calculations accordingly.

> Denis' implementation works around our current synchronous crypt_all()
> API by buffering a large number of candidate passwords - many times
> larger than the number of cores.  The current design has 124 bcrypt
> cores per chip, so 496 per board.  My tests are with "TargetSetting = 5"
> (tuning for bcrypt cost 5) in the "[ZTEX:bcrypt]" section in john.conf,
> and this results in:
>
> 0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 63488

I wasn't paying a lot of attention to it at the time, but looking at
john.log, unless I've lost track of something, my value was:

0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 262140

... for values of both 5 and 6 for TargetSetting.


My first tests were with all 16 boards.

The first test used the default john.conf [ZTEX:bcrypt] TargetSetting
= 6 value, with john compiled with the keys_per_crypt *= 2 tweak:

$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' pw-fake-unix

Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex
[Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:01:54  0g/s 0p/s 1609Kc/s 1609KC/s loveaaaa..loveioia
0g 0:00:04:54  0g/s 0p/s 1611Kc/s 1611KC/s loveaaaa..loveioia
0g 0:00:09:53  0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:11:56  0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:12:18  0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:19:32  0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:22:06  0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:24:21  0g/s 0p/s 1612Kc/s 1612KC/s loveaaaa..loveioia
0g 0:00:27:30  0g/s 0p/s 1613Kc/s 1613KC/s loveaaaa..loveioia
0g 0:00:32:16 0.00% (ETA: 2030-12-18 00:34)
    0g/s 491.5p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:00:43:20 0.00% (ETA: 2035-07-31 03:45)
    0g/s 366.0p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:00:51:23  0g/s 308.6p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:00:57:27  0g/s 276.0p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:01:00:56  0g/s 260.3p/s 1613Kc/s 1613KC/s lovaaani..lovaioli
0g 0:01:13:00 0.00% (ETA: 2032-09-23 00:29) 0g/s 434.5p/s 1613Kc/s
1613KC/s lolaaatn..lolaiocn


That test ran at ~505W / 16 = ~31.6W per board, which includes the
power for the onboard fans. The power consumption actually jumps
around quite a bit between 495W and 515W, but 505W seemed about
average.

The second test was with 16 boards, changing to TargetSetting = 5, and
still with keys_per_crypt *= 2:

$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' pw-fake-unix

Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex
[Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:12  0g/s 0p/s 1625Kc/s 1625KC/s loveaaaa..loveaida
0g 0:00:02:02  0g/s 0p/s 1633Kc/s 1633KC/s loveaaaa..loveaida
0g 0:00:03:14  0g/s 0p/s 1633Kc/s 1633KC/s loveaaaa..loveaida
0g 0:00:08:22  0g/s 0p/s 1633Kc/s 1633KC/s loveaaaa..loveaida
0g 0:00:12:30  0g/s 0p/s 1632Kc/s 1632KC/s loveaaaa..loveaida
0g 0:00:17:57  0g/s 0p/s 1632Kc/s 1632KC/s loveaaaa..loveaida
0g 0:00:21:34  0g/s 0p/s 1632Kc/s 1632KC/s loveaaaa..loveaida
0g 0:00:24:27  0g/s 0p/s 1631Kc/s 1631KC/s loveaaaa..loveaida
0g 0:00:38:52 0.00% (ETA: 2031-03-22 14:56)
    0g/s 482.2p/s 1632Kc/s 1632KC/s lovaaaay..lovaaidy
0g 0:00:41:28 0.00% (ETA: 2032-02-20 19:37)
    0g/s 452.0p/s 1632Kc/s 1632KC/s lovaaaay..lovaaidy

For that test, I'd say that power was very slightly higher, maybe
averaging 510W, so ~31.9W per board. But this might be normal
variation.

So across the cluster, with known tweaks and settings without
overclocking, I'm getting 1.632Mc/s for 510W.

Next, here are single-board versions of both tests, using the same
board. (I did this by disconnecting the other boards. Is there a way
to tell john to only use a specific device?)

First, TargetSetting = 5, keys_per_crypt *= 2:

$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' pw-fake-unix
SN XXXXXXXXXX: firmware uploaded
SN XXXXXXXXXX: uploading bitstreams.. ok
ZTEX XXXXXXXXXX bus:1 dev:72 Frequency:141 141 141 141
Using default input encoding: UTF-8
Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex
[Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:14  0g/s 0p/s 106815c/s 106815C/s loveaaaa..loveaaoa
0g 0:00:03:12  0g/s 0p/s 107169c/s 107169C/s loveaaaa..loveaaoa
0g 0:00:05:44  0g/s 0p/s 107173c/s 107173C/s loveaaaa..loveaaoa
0g 0:00:06:51  0g/s 0p/s 107181c/s 107181C/s loveaaaa..loveaaoa
0g 0:00:10:36  0g/s 0p/s 107190c/s 107190C/s loveaaaa..loveaaoa
0g 0:00:15:34  0g/s 0p/s 107197c/s 107197C/s loveaaaa..loveaaoa
0g 0:00:20:13  0g/s 0p/s 107194c/s 107194C/s loveaaaa..loveaaoa
0g 0:00:24:07  0g/s 0p/s 107199c/s 107199C/s loveaaaa..loveaaoa


Then using TargetSetting at the default of 6, keys_per_crypt *= 2
(--progress-every, where have you been all my life?)

$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' --progress-every=300 pw-fake-unix
ZTEX XXXXXXXXXX bus:1 dev:72 Frequency:141 141 141 141
Using default input encoding: UTF-8
Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex
[Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:01  0g/s 0p/s 102565c/s 102565C/s loveaaaa..loveomaa
0g 0:00:05:00  0g/s 0p/s 106748c/s 106748C/s loveaaaa..loveomaa
0g 0:00:10:00  0g/s 0p/s 106902c/s 106902C/s loveaaaa..loveomaa
0g 0:00:15:00  0g/s 0p/s 106952c/s 106952C/s loveaaaa..loveomaa
0g 0:00:20:00  0g/s 0p/s 106978c/s 106978C/s loveaaaa..loveomaa
0g 0:00:25:00  0g/s 0p/s 106991c/s 106991C/s loveaaaa..loveomaa
0g 0:00:30:00  0g/s 33.04p/s 107002c/s 107002C/s loveaaco..loveomco
0g 0:00:35:00  0g/s 28.32p/s 107010c/s 107010C/s loveaaco..loveomco
0g 0:00:40:00  0g/s 24.78p/s 107015c/s 107015C/s loveaaco..loveomco
0g 0:00:45:00  0g/s 22.02p/s 107019c/s 107019C/s loveaaco..loveomco
0g 0:00:50:00  0g/s 19.82p/s 107023c/s 107023C/s loveaaco..loveomco
0g 0:00:55:00  0g/s 18.02p/s 107027c/s 107027C/s loveaaco..loveomco
0g 0:01:00:00  0g/s 33.04p/s 107031c/s 107031C/s loveaavl..loveomvl


Then I enabled the full cluster again.

Here are all 16 boards again, with TargetSetting = 5, the
keys_per_crypt *= 2 tweak, and Frequency = 152.

During this test, I was also trying to coax a 17th board into
usability. I include this test anyway because there appears to have
been a slight (temporary?) drop in performance associated with the
attempt to talk to that board (or it might be a coincidence; I will
test further to check this correlation):

Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex
[Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:01:03  0g/s 0p/s 1654Kc/s 1654KC/s loveaaaa..loveaida
0g 0:00:05:00  0g/s 0p/s 1655Kc/s 1655KC/s loveaaaa..loveaida
SN XXXXXXXXXX: firmware uploaded
SN XXXXXXXXXX: uploading bitstreams.. ok
SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type
SN XXXXXXXXXX: uploading bitstreams.. ok
SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type
SN XXXXXXXXXX: firmware uploaded
SN XXXXXXXXXX: uploading bitstreams.. ok
SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type
SN XXXXXXXXXX: uploading bitstreams.. ok
SN XXXXXXXXXX: device_list_check_bitstreams(): no bitstream or wrong type
0g 0:00:08:19  0g/s 0p/s 1644Kc/s 1644KC/s loveaaaa..loveaida
0g 0:00:10:00  0g/s 0p/s 1645Kc/s 1645KC/s loveaaaa..loveaida
0g 0:00:15:00  0g/s 0p/s 1649Kc/s 1649KC/s loveaaaa..loveaida
0g 0:00:18:03  0g/s 0p/s 1650Kc/s 1650KC/s loveaaaa..loveaida
0g 0:00:20:00  0g/s 0p/s 1650Kc/s 1650KC/s loveaaaa..loveaida
0g 0:00:25:00  0g/s 0p/s 1651Kc/s 1651KC/s loveaaaa..loveaida
0g 0:00:30:00  0g/s 0p/s 1652Kc/s 1652KC/s loveaaaa..loveaida
0g 0:00:35:00  0g/s 0p/s 1652Kc/s 1652KC/s loveaaaa..loveaida
0g 0:00:40:00 0.00% (ETA: 2031-08-16 00:16)
    0g/s 468.6p/s 1653Kc/s 1653KC/s lovaaaay..lovaaidy
0g 0:00:45:00 0.00% (ETA: 2033-05-21 14:48)
    0g/s 416.5p/s 1653Kc/s 1653KC/s lovaaaay..lovaaidy
0g 0:00:50:00 0.00% (ETA: 2035-02-25 05:21)
    0g/s 374.8p/s 1653Kc/s 1653KC/s lovaaaay..lovaaidy


And finally, a more focused example - all 16 boards, a single
artificial hash, with bcrypt work factor 12, with the same tweaks:

$ cat single-bf.hash
$2a$12$S7H1VijH5FFkU/1bWeM98ObKGC6BwfjNnhsPFs3U88yNbYSphoTp.

$ ./john -format=bcrypt-ztex -inc=lower -min-len=8 -max-len=8
-mask='?w?l?l?l?l' --progress-every=300 single-bf.hash
Using default input encoding: UTF-8
Loaded 1 password hash (bcrypt-ztex [Blowfish ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:12 0.00% (ETA: 2017-12-16 07:45)
    0g/s 14299p/s 14299c/s 14299C/s loveisxm..lovehjfc
0g 0:00:05:00 0.00% (ETA: 2017-12-17 12:58)
    0g/s 14422p/s 14422c/s 14422C/s laliawhy..lalidtdh
0g 0:00:10:01 0.00% (ETA: 2017-12-17 19:40)
    0g/s 14417p/s 14417c/s 14417C/s bebeapeq..bebednqq
0g 0:00:15:00 0.01% (ETA: 2017-12-17 17:52)
    0g/s 14417p/s 14417c/s 14417C/s lalluepc..lallqhbd
0g 0:00:20:00 0.01% (ETA: 2017-12-17 20:20)
    0g/s 14414p/s 14414c/s 14414C/s pinaidtw..pinahzrz
0g 0:00:25:00 0.01% (ETA: 2017-12-17 18:51)
    0g/s 14413p/s 14413c/s 14413C/s poleiswt..polehjjm
0g 0:00:30:00 0.01% (ETA: 2017-12-17 20:20)
    0g/s 14412p/s 14412c/s 14412C/s locakkyv..locaeocf
0g 0:00:35:00 0.01% (ETA: 2017-12-17 21:23)
    0g/s 14412p/s 14412c/s 14412C/s beednaol..beedbyas
0g 0:00:40:00 0.02% (ETA: 2017-12-17 20:20)
    0g/s 14414p/s 14414c/s 14414C/s popenwkp..popebtuj
0g 0:00:45:00 0.02% (ETA: 2017-12-17 19:31)
    0g/s 14416p/s 14416c/s 14416C/s luiznpnr..luizbnil
0g 0:00:50:00 0.02% (ETA: 2017-12-17 18:51)
    0g/s 14417p/s 14417c/s 14417C/s boolnupg..boolbakp
0g 0:00:55:00 0.02% (ETA: 2017-12-17 19:40)
    0g/s 14418p/s 14418c/s 14418C/s puthpzto..puthocln
0g 0:01:00:00 0.02% (ETA: 2017-12-17 19:06)
    0g/s 14419p/s 14419c/s 14419C/s joespjwb..joesolvk
0g 0:01:01:00 0.03% (ETA: 2017-12-17 18:31)
    0g/s 14419p/s 14419c/s 14419C/s johoiouh..johohbdu

This pulled about 560W from the wall.


I tried to compare this to john on my general-purpose GPU system
(which isn't working the way I expect it to, as it appears to only be
using one GPU. Not sure what I'm doing wrong yet):

$ ./john --format=bcrypt-opencl --device=gpu --fork=6 -inc=lower
-min-len=8 -max-len=8 -mask='?w?l?l?l?l' --progress-every=300
--max-run-time=3660 single-bf.hash
Using default input encoding: UTF-8
Loaded 1 password hash (bcrypt-opencl [Blowfish OpenCL])
Node numbers 1-6 of 6 (fork)
Device 3: GeForce GTX 1080
Device 0: GeForce GTX 1080
Device 5: GeForce GTX 1080
Device 4: GeForce GTX 1080
Device 1: GeForce GTX 1080
Device 2: GeForce GTX 1080
[ptxas info elided]
Press 'q' or Ctrl-C to abort, almost any other key for status
1 0g 0:00:01:16 0.00% (ETA: 2037-12-19 05:32) 0g/s 53.38p/s 53.38c/s
53.38C/s GPU:34C lilluela..lilleoya

... but maybe all six GPUs might run at 53.38c/s x 6 = 320c/s?


I also compared GPU performance with hashcat.

First, with max power throttled down to 150W per card from the default
of 180, which is how I usually run:

$ hashcat -w 4 -a 3 -m 3200 single-bf.hash ?l?l?l?l?l?l?l
hashcat (v3.6.0-44-g21d10215+) starting...

OpenCL Platform #1: NVIDIA Corporation
======================================
* Device #1: GeForce GTX 1080, 2028/8113 MB allocatable, 20MCU
* Device #2: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU
* Device #3: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU
* Device #4: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU
* Device #5: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU
* Device #6: GeForce GTX 1080, 2028/8114 MB allocatable, 20MCU

Hashes: 1 digests; 1 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes, 5/13 rotates

Applicable optimizers:
* Zero-Byte
* Single-Hash
* Single-Salt
* Brute-Force

Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger disabled.

[s]tatus [p]ause [r]esume [b]ypass [c]heckpoint [q]uit =>

Session..........: hashcat
Status...........: Running
Hash.Type........: bcrypt $2*$, Blowfish (Unix)
Hash.Target......: $2a$12$S7H1VijH5FFkU/1bWeM98ObKGC6BwfjNnhsPFs3U88yN...phoTp.
Time.Started.....: Sun Jul  2 20:51:46 2017 (9 mins, 31 secs)
Time.Estimated...: Thu Nov  2 04:23:40 2017 (122 days, 7 hours)
Guess.Mask.......: ?l?l?l?l?l?l?l [7]
Guess.Queue......: 1/1 (100.00%)
Speed.Dev.#1.....:      128 H/s (154.07ms)
Speed.Dev.#2.....:      125 H/s (157.26ms)
Speed.Dev.#3.....:      127 H/s (154.83ms)
Speed.Dev.#4.....:      127 H/s (154.09ms)
Speed.Dev.#5.....:      128 H/s (154.25ms)
Speed.Dev.#6.....:      126 H/s (155.12ms)
Speed.Dev.#*.....:      760 H/s
Recovered........: 0/1 (0.00%) Digests, 0/1 (0.00%) Salts
Progress.........: 422400/8031810176 (0.01%)
Rejected.........: 0/422400 (0.00%)
Restore.Point....: 0/308915776 (0.00%)
Candidates.#1....: oarieri -> ombreri
Candidates.#2....: ovhteri -> oibzana
Candidates.#3....: osdyban -> ojkhana
Candidates.#4....: opwzana -> ozanana
Candidates.#5....: oufgeri -> ocwzana
Candidates.#6....: oxckier -> ohydana
HWMon.Dev.#1.....: Temp: 35c Fan:100% Util:100% Core:1911MHz Mem:4513MHz Bus:8
HWMon.Dev.#2.....: Temp: 35c Fan:100% Util:100% Core:1873MHz Mem:4513MHz Bus:4
HWMon.Dev.#3.....: Temp: 39c Fan:100% Util:100% Core:1898MHz Mem:4513MHz Bus:16
HWMon.Dev.#4.....: Temp: 35c Fan:100% Util:100% Core:1898MHz Mem:4513MHz Bus:4
HWMon.Dev.#5.....: Temp: 34c Fan:100% Util:100% Core:1911MHz Mem:4513MHz Bus:1
HWMon.Dev.#6.....: Temp: 33c Fan:100% Util:100% Core:1898MHz Mem:4513MHz Bus:1


Returning the GPUs' default max power (180W) made no difference at all
for a single $12$ bcrypt hash.

In both cases, the GPU system was pulling 500W from the wall, and the
GPUs hardly broke a sweat, temperature-wise. There may be ways to get
more performance from hashcat for this hash type and work factor, but
that will take some research on my part.

So if I'm reading this right, for single-hash bcrypt with work factor
12, just using my own hardware and techniques to compare, the best
performance available to me so far on FPGA (14419c/s) is about 19
times as fast as the best performance I know how to get on my GPU
system (760H/s), at around the same power consumption:

FPGA: 14419c/s / 560W = ~25.75c/s/W
GPU: 760H/s / 500W = 1.52H/s/W

So for a focused, single-hash attack on a modern target using my own
gear, FPGA is ~17 times as efficient as GPU?

I will also do some testing without the keys_per_crypt *= 2 tweak, and
with different keys_per_crypt values, but I wanted to get this posted.

Royce

Powered by blists - more mailing lists

Your e-mail address:

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.