Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Fri, 29 Mar 2019 13:25:48 +0100
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Cc: apingis@...nwall.net
Subject: Re: DES-based crypt(3) cracking on ZTEX 1.15y FPGA boards (descrypt-ztex)

Hi,

After almost 2 years, we have an update to descrypt-ztex:

On Thu, Jun 29, 2017 at 08:39:16PM +0200, Solar Designer wrote:
> On Sun, Nov 06, 2016 at 03:06:53PM +0100, Solar Designer wrote:
> > This implements descrypt aka traditional DES-based crypt(3) hash
> > cracking on ZTEX 1.15y quad Spartan-6 LX150 FPGA boards.  Some of you in
> > here have these or compatibles (such as the US clones of the originally
> > German ZTEX boards).  Most of these boards previously worked as Bitcoin
> > miners, and were then resold on eBay and such at a fraction of the
> > original price.  Those we bought for development cost us between 100 EUR
> > (lately) and 250 EUR each (earlier).  They became rare on eBay now, but
> > I guess some asking around in cryptocurrency forums will do the trick
> > since there were a lot of those boards around, and only a fraction ever
> > reached eBay.  ZTEX itself does not sell them anymore.
> > 
> > As implemented by Denis, the "descrypt-ztex" format supports "mask mode"
> > (with on-device mask), hybrid modes (where you add a mask on top of
> > another mode, referring to the previous mode's generated portions of
> > candidate passwords with the "?w" mask), up to 2047 hashes per salt
> > (with on-device comparator) so up to a few million hashes loaded total
> > perhaps (given a good salt distribution), and it can work with one or
> > multiple ZTEX boards at once.
> 
> Besides his recently committed work on bcrypt-ztex Denis has also been
> trying to redesign descrypt-ztex.  While his attempts were promising
> (with ~50% greater expected speeds), they mostly failed so far with
> difficult to debug issues.  Given the low demand for any of this (with
> it being mostly an experiment), I asked Denis that rather than keep
> trying to get much better speeds he gathers whatever minor optimizations
> he could get working quickly and commits those - and he did.  The result
> is a design that should run approx. 19/17 times = ~12% faster, and can
> be overclocked slightly (5% or so) on top of that.

> > Performance is up to about 740M hash computations per second (with room
> > for further improvement).
> 
> I am now getting ~806M c/s at standard clocks, ~840M at 5% overclocking
> (which appears stable on this board, but YMMV).  This is with the same
> Qubes USB pass-through as I described for my bcrypt-ztex testing here:
> 
> http://www.openwall.com/lists/john-users/2017/06/25/1
> 
> Performance should be higher without the virtualization (or with USB
> controller pass-through rather than individual device proxying).

Denis has now made another attempt at getting further descrypt-ztex
optimizations to work, armed with the experience we gained during Denis'
experiments with other designs.  Specifically, we learned that designs
with fewer clock domains work more reliably at high device utilization
and power draw.  Previously, the descrypt-ztex design ran DES cores at
one clock rate (220 MHz by default) and comparators at another (160 MHz
by default, which was sufficient).  Denis has now optimized the
comparators to run at the full DES cores' clock rate and brought them
into the same clock domain.  With this, he was able to get his actual
optimizations to work.

There's revised documentation of the design here:

https://github.com/magnumripper/JohnTheRipper/tree/bleeding-jumbo/src/ztex/fpga-descrypt

Previously, descrypt-ztex had one shared on-device candidate password
generator feeding 24 descrypt cores.  More cores could easily fit in the
device, but the generator's capacity was sufficient to feed only ~23.5
cores, so adding more cores than the 24 made no sense.

In the revised design, there are now two big units each with its own
candidate password generator feeding its own set of 16 descrypt cores,
for a total of 32 cores.

The design tools' reported frequency is 221 MHz, which is similar to
what we had before, so the expected speedup would be 32/23.5 or +36%.

Unfortunately, as we've already seen with other designs trying to
utilize the devices more fully (with bcrypt-ztex being the only
exception), the boards become unreliable at full clock rate.  In our
testing, the new design is reliable across many boards at 190 MHz.
The luckiest board I have seems to run the new design OK at 215 MHz.
Nevertheless, even at 190 MHz the new design is ~17% faster than the
previous one was at its 220 MHz.

As Denis wrote in his GitHub pull request, at 190 MHz the "theoretical
performance is 973 Mc/s, measured 950-960 Mc/s regardless of number of
hashes to compare" and "current consumption: 2.8A, at idle 0.4A" at 12V.
This corresponds to power consumption of 34W, which is similar to what
we had before.  And yes, this design update also adds clock gating (all
of the *-ztex designs have this now), bringing the idle power
consumption to under 5W.

I ran many tests of the new descrypt-ztex yesterday and today.  For this
posting, I'll include tests at length 7 passwords as I think that's what
Hashcat benchmarks use, although descrypt-ztex speeds don't actually
vary by length (they may in equivalent attacks on CPU and GPU) and I've
also confirmed that things work right for all other password lengths.

One board (4 FPGAs), 190 MHz:

$ ./john -form=descrypt-ztex -inc=lower -min-len=7 -max-len=7 -mask='?w?l?l?l?l' pw-fake-len7
Warning! Section [list.ztex:devices] overridden by john-local.conf
SN 2: firmware uploaded
SN 2: uploading bitstreams.. ok
ZTEX 2 bus:2 dev:86 Frequency:190 190 190 190
Using default input encoding: UTF-8
Loaded 464 password hashes with 442 different salts (descrypt-ztex, traditional crypt(3) [DES ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
buttons          (u560-des)
cowboys          (u1009-des)
[...]
208g 0:00:05:04  0.6825g/s 0p/s 972444Kc/s 1026MC/s kenneth..benaikz
[...]
iforget          (u2181-des)
Warning: Only 1449 candidates left, minimum 1792 needed for performance.
awesome          (u539-des)
tequila          (u2775-des)
464g 0:00:13:55 N/A 0.5555g/s 9619Kp/s 970537Kc/s 1001MC/s tequila..xqj####
Use the "--show" option to display all of the cracked passwords reliably
Session completed

This is 970M+ c/s actual speed, which is closer than what Denis observed
to the theoretical peak speed of 973M.  I guess this might be due to the
faster host system (desktop vs. laptop) or running against many salts.

Four boards (16 FPGAs), 190 MHz:

$ ./john -form=descrypt-ztex -inc=lower -min-len=7 -max-len=7 -mask='?w?l?l?l?l' pw-fake-len7
Warning! Section [list.ztex:devices] overridden by john-local.conf
ZTEX 3 bus:2 dev:79 Frequency:190 190 190 190
ZTEX 1 bus:2 dev:80 Frequency:190 190 190 190
ZTEX 4 bus:2 dev:81 Frequency:190 190 190 190
ZTEX 2 bus:2 dev:78 Frequency:190 190 190 190
Using default input encoding: UTF-8
Loaded 464 password hashes with 442 different salts (descrypt-ztex, traditional crypt(3) [DES ZTEX])
Press 'q' or Ctrl-C to abort, almost any other key for status
buttons          (u560-des)
cowboys          (u1009-des)
[...]
428g 0:00:04:21 40.78% (ETA: 00:19:45) 1.637g/s 12534Kp/s 3802Mc/s 3959MC/s oahaaaa..oaharsf
[...]
vermont          (u2845-des)
457g 0:00:04:38 61.17% (ETA: 00:16:39) 1.643g/s 17674Kp/s 3799Mc/s 3947MC/s vermont..phyarsf
zxcvbnm          (u345-des)
airwolf          (u1654-des)
zepplin          (u2912-des)
phoenix          (u223-des)
Warning: Only 3241 candidates left, minimum 3584 needed for performance.
awesome          (u539-des)
tequila          (u2775-des)
iforget          (u2181-des)
464g 0:00:04:41 N/A 1.650g/s 28584Kp/s 3798Mc/s 3944MC/s iforget..xqj####
Use the "--show" option to display all of the cracked passwords reliably
Session completed

Scaling efficiency:

3798000/970537/4 = 97.8%

The running time reduction seen here is much less than 4x because both
runs were lucky to crack all passwords before reaching 100% of the
keyspace, but the single-board run was luckier.  The running time is
also reduced by salts being eliminated as more passwords get cracked.
(To search this full keyspace against the 442 salts without eliminating
any, it'd have taken 1 hour on one board or 15 minutes on four boards.)

One luckiest board (4 FPGAs), 215 MHz:

$ ./john -form=descrypt-ztex -inc=lower -min-len=7 -max-len=7 -mask='?w?l?l?l?l' pw-fake-len7
Warning! Section [list.ztex:devices] overridden by john-local.conf
ZTEX 2 bus:2 dev:72 Frequency:215 215 215 215
Using default input encoding: UTF-8
Loaded 464 password hashes with 442 different salts (descrypt-ztex, traditional crypt(3) [DES ZTEX])
[...]
59g 0:00:01:23  0.7051g/s 0p/s 1104Mc/s 1191MC/s benaaaa..benaibg
[...]
Warning: Only 1737 candidates left, minimum 1760 needed for performance.
awesome          (u539-des)
tequila          (u2775-des)
464g 0:00:12:19 N/A 0.6270g/s 10869Kp/s 1098Mc/s 1132MC/s tequila..xqj####

We're able to get nearly 1100M c/s here, which is close to theoretical
maximum for this clock rate.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.