|   | 
| 
 | 
Message-ID: <20180723152751.GA11178@openwall.com> Date: Mon, 23 Jul 2018 17:27:51 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Cc: Denis Burykin <apingis@...nwall.net> Subject: sha512crypt & Drupal 7+ password cracking on FPGA Hi, As many of you are aware, we support descrypt and bcrypt password hash cracking on the old ZTEX 1.15y quad-FPGA boards. Threads: http://www.openwall.com/lists/john-users/2016/11/06/1 http://www.openwall.com/lists/john-users/2017/06/25/1 Now Denis has also added support for sha512crypt and Drupal 7+ SHA-512 based password hashes on those same old boards. We had achieved energy-efficiency improvement over current high-end GPUs at descrypt and bcrypt, and in the case of bcrypt also decent speed improvement per board and per rig (see further messages in the above threads). However, for sha512crypt and Drupal 7+ hashes we're merely on par with current high-end GPUs in terms of energy-efficiency and our speeds per-board are lower (it takes four or so boards to match one high-end GPU). Thus, for practical purposes this is useful to those who have those boards anyway or would acquire such boards primarily for bcrypt and descrypt, so that the boards can also be put to more uses. This is also valuable as being, to the best of my knowledge, the very first implementation of these two hash types on FPGA. And it is also our first attempt to use specialized soft CPU cores(*) along with cryptographic cores in an FPGA design to combine some limited flexibility (in this case, used to implement two higher-level hash types in one bitstream) with resource savings (no need to waste logic on sha512crypt's higher-level algorithm specifics) and efficient cryptographic cores (in this case, SHA-512). Application of a similar approach to newer and much larger FPGAs (such as those available on AWS F1) will result in improvement over current GPUs at least in energy-efficiency (and for the largest FPGAs probably also in performance). (*) Denis' bcrypt design uses microcode to save on logic, but it's a closer match to historical CPUs' wide microcode than to a CPU program. Maybe it'll help us implement bcrypt-pbkdf at some point, though. Denis wrote a good description of the design with some ASCII diagrams, currently found here: https://github.com/magnumripper/JohnTheRipper/tree/bleeding-jumbo/src/ztex/fpga-sha512crypt Each soft CPU core is 16-way SMT (runs 16 hardware threads with their separate register files) and it controls four SHA-512 cores with each of those capable of up to four in-flight hash computations (most of the time only two are being computed, but there's some overlap between finishing processing on one pair of hashes and starting on the next). One soft CPU core (plus its memory and glue logic) and four SHA-512 cores form a unit. The SHA-512 cores occupy 80% of the unit's area, so in those terms the overhead of using soft CPUs is at most 25% (but they actually help save on algorithm-specific logic). 10 units fit in one Spartan-6 LX150 FPGA. This means 10 soft CPU cores, 160 hardware threads, 40 SHA-512 cores, up to 160 in-flight SHA-512 per FPGA. Four times that per board. Also included are on-device candidate password generator (for mask mode, including in hybrid modes along with a wordlist coming from host, etc.) and hash comparator (capable of up to 512 loaded hashes per salt; no limit on total loaded hashes as that's handled on host). This is similar to what Denis' designs for descrypt and bcrypt also have. sha512crypt and Drupal 7+ hashes are two entry points into the program memory. (The Drupal 7+ program is much simpler than sha512crypt's. It could also be more efficient on a more specialized design since it does not need unaligned access to the buffers, which we support for sha512crypt. Yet it's good to have it along with sha512crypt essentially for free.) Per Xilinx tools, this design was supposed to work at 225 MHz. Unfortunately, in our testing it only works at this frequency with very few units built into the bitstream. We don't know exactly why (maybe it's the power draw). With 10 units, the design works reliably for us at 135 MHz on many boards tested, so that's what we set as the current default. It also sometimes works at higher frequencies such as 160 MHz, but other times not. This is configurable in john.conf. Here's a test run against 512 of same-salt sha512crypt hashes (good for quick reliability testing as all 512 are supposed to be cracked) on one board (4 FPGAs) at 135 MHz: $ ./john -2='1A2B3C4D5E6F7G8H9I0J' --mask='?2?2?2?2?2' --format=sha512crypt-ztex --verbosity=1 pw-sha512crypt [...] Loaded 512 password hashes with no different salts (sha512crypt-ztex, crypt(3) $6$ [sha512crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 327g 0:00:00:42 62.00% (ETA: 15:55:22) 7.746g/s 47003p/s 47003c/s 16282KC/s 40447..40137 512g 0:00:01:05 DONE (2018-07-23 15:55) 7.825g/s 46950p/s 46950c/s 12179KC/s 40500..40190 Session completed Four boards (16 FPGAs), 135 MHz: $ ./john -2='1A2B3C4D5E6F7G8H9I0J' --mask='?2?2?2?2?2' --format=sha512crypt-ztex --verbosity=1 pw-sha512crypt [...] Loaded 512 password hashes with no different salts (sha512crypt-ztex, crypt(3) $6$ [sha512crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 378g 0:00:00:12 72.00% (ETA: 15:53:55) 30.45g/s 185656p/s 185656c/s 62318KC/s 40348..1AF58 512g 0:00:00:16 DONE (2018-07-23 15:53) 30.89g/s 185395p/s 185395c/s 51138KC/s 40000..40140 Session completed Scaling efficiency 185395/46950/4 = 98.7%. Four boards (16 FPGAs), 160 MHz: $ ./john -2='1A2B3C4D5E6F7G8H9I0J' --mask='?2?2?2?2?2' --format=sha512crypt-ztex --verbosity=1 pw-sha512crypt [...] Loaded 512 password hashes with no different salts (sha512crypt-ztex, crypt(3) $6$ [sha512crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 174g 0:00:00:04 32.00% (ETA: 15:57:33) 36.78g/s 216490p/s 216490c/s 94714KC/s 40044..1AF54 512g 0:00:00:14 DONE (2018-07-23 15:57) 36.44g/s 218647p/s 218647c/s 60310KC/s 40000..40340 Session completed This is similar speed to what Jeremi Gosney reported for hashcat on one GTX 1080 Ti at stock clocks: https://gist.github.com/epixoip/973da7352f4cc005746c627527e4d073 Hashtype: sha512crypt, SHA512(Unix) Speed.Dev.#1.....: 216.0 kH/s (53.53ms) Somehow a newer benchmark of 8x GTX 1080 Ti shows slightly higher speed per GPU: https://gist.github.com/epixoip/ace60d09981be09544fdd35005051505 Hashtype: sha512crypt $6$, SHA512 (Unix) Speed.Dev.#1.....: 235.9 kH/s (96.29ms) Speed.Dev.#2.....: 228.3 kH/s (50.67ms) Speed.Dev.#3.....: 230.4 kH/s (50.22ms) Speed.Dev.#4.....: 230.5 kH/s (50.18ms) Speed.Dev.#5.....: 230.6 kH/s (50.16ms) Speed.Dev.#6.....: 230.1 kH/s (50.27ms) Speed.Dev.#7.....: 232.0 kH/s (49.85ms) Speed.Dev.#8.....: 231.3 kH/s (50.01ms) Speed.Dev.#*.....: 1849.1 kH/s We're probably consuming around 160W for the boards (Denis measured 3.4A at 12V per board at 160 MHz, which translates to ~40W/board) or 180W at the wall at ~90% PSU efficiency. I guess GTX 1080 Ti might consume a little bit more at this benchmark (it's a 300W TDP card). Jeremi (or someone else who has one of those cards) can probably check via nvidia-smi while running hashcat. Drupal 7+ hash, one board (4 FPGAs) at 135 MHz: $ ./john -2='pasword' --mask='?2?2?2?2?2?2?2?2' --format=drupal7-ztex pw-drupal7 [...] Loaded 1 password hash (Drupal7-ztex, $S$ [SHA512 ZTEX]) Cost 1 (iteration count) is 16384 for all loaded hashes Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:10 2.49% (ETA: 16:08:54) 0g/s 14250p/s 14250c/s 14250C/s prdowaap..oooarsap 0g 0:00:02:03 30.91% (ETA: 16:08:49) 0g/s 14421p/s 14421c/s 14421C/s awoppaas..rssoasas 0g 0:00:03:31 52.93% (ETA: 16:08:50) 0g/s 14427p/s 14427c/s 14427C/s wdwdwdow..pdawrprw 0g 0:00:06:20 95.21% (ETA: 16:08:51) 0g/s 14430p/s 14430c/s 14430C/s wpddwood..ppowrrod password (?) 1g 0:00:06:28 DONE (2018-07-23 16:08) 0.002571g/s 14428p/s 14428c/s 14428C/s password..orpadord Use the "--show" option to display all of the cracked passwords reliably Session completed Four boards (16 FPGAs), 135 MHz: $ ./john -2='pasword' --mask='?2?2?2?2?2?2?2?2' --format=drupal7-ztex pw-drupal7 [...] Loaded 1 password hash (Drupal7-ztex, $S$ [SHA512 ZTEX]) Cost 1 (iteration count) is 16384 for all loaded hashes Warning: Slow communication channel to the device. Increase mask or expect performance degradation. Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:10 10.23% (ETA: 16:01:23) 0g/s 56120p/s 56120c/s 56120C/s oaoopprp..rooddwrp 0g 0:00:00:35 35.24% (ETA: 16:01:26) 0g/s 56590p/s 56590c/s 56590C/s dwpadaws..ppawrrws 0g 0:00:01:01 60.25% (ETA: 16:01:27) 0g/s 56662p/s 56662c/s 56662C/s adwoowao..ssodwpso password (?) 1g 0:00:01:39 DONE (2018-07-23 16:01) 0.01005g/s 56678p/s 56678c/s 56678C/s password..wsrssdrd Use the "--show" option to display all of the cracked passwords reliably Session completed Scaling efficiency 56678/14428/4 = 98.2% despite of the complaint about too small mask (too few different characters for the mask positions handled on device). Four boards (16 FPGAs), 160 MHz: $ ./john -2='pasword' --mask='?2?2?2?2?2?2?2?2' --format=drupal7-ztex pw-drupal7 [...] Loaded 1 password hash (Drupal7-ztex, $S$ [SHA512 ZTEX]) Cost 1 (iteration count) is 16384 for all loaded hashes Warning: Slow communication channel to the device. Increase mask or expect performance degradation. Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:12 14.78% (ETA: 16:11:22) 0g/s 65890p/s 65890c/s 65890C/s rpdroapa..dwdporpa 0g 0:00:00:31 36.38% (ETA: 16:11:25) 0g/s 66386p/s 66386c/s 66386C/s apawrrws..swarosos 0g 0:00:01:16 88.67% (ETA: 16:11:26) 0g/s 66586p/s 66586c/s 66586C/s soapawad..wpssppsd password (?) 1g 0:00:01:24 DONE (2018-07-23 16:11) 0.01180g/s 66541p/s 66541c/s 66541C/s password..wsrssdrd Use the "--show" option to display all of the cracked passwords reliably Session completed We'd appreciate more testing, such as on Royce' larger cluster of these boards maybe. Please post your results as follow-ups to this message. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.