Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Wed, 31 May 2017 10:03:38 -0800
From: Royce Williams <>
Subject: Re: other algorithms on ZTEX 1.15y?

On Wed, May 31, 2017 at 9:33 AM, Solar Designer <> wrote:
> On Wed, May 31, 2017 at 06:07:56AM -0800, Royce Williams wrote:
>> Beyond the algorithms either already supported in john or implemented
>> elsewhere (descrypt, bcrypt, DES), what other algorithms are feasible
>> or worthwhile on ZTEX?
> Are you aware of bcrypt already implemented on ZTEX elsewhere?  Where
> exactly?  Have you tested?

I'm only aware of the work that you and Katja presented a couple of years ago.

> Regarding DES, are you referring to Gifts' implementation?  Have you
> tried using it, or anything else?

I'm only aware of Gifts', as I believe that he participated in the
work as described in the Positive Technologies blog here:

> Maybe we need to add a plain DES cracker mode to JtR, like I think
> hashcat has now (but not on FPGAs yet).

Yes, hashcat has DES (mode 14000) and 3DES (mode 14100) now.

> As to our developments so far, after the descrypt-ztex format Denis has
> also been working on bcrypt-ztex, citing speeds of ~105k c/s per board
> at bcrypt cost 5 - but this work is yet to be completed and merged.
> Actual speeds will vary by cracking mode since the current synchronous
> crypt_all() API combined with the not-so-fast USB interface results in
> significant idle time when the candidate passwords are fed from the
> host.  On-FPGA mask mode mostly avoids that (and so will an API revision
> for asynchronous processing, but we haven't gotten around to that yet).\

Interesting - good to know!

>> This project is working on WPA2 support, which seems interesting:
>> From a brief review of the project's files, I infer that SHA1 and
>> PBKDF2 would be possible on ZTEX. Would they be worth the effort?
> For PBKDF2 with MD*/SHA-1/SHA-2, it should be possible to obtain
> GPU-like speeds on ZTEX, roughly like these boards worked for Bitcoin
> mining (thus, one quad-FPGA board is roughly same as one high-end GPU
> from 2015 or so).  The purpose would be to put these boards to more
> general use and to achieve better energy efficiency (compared to GPUs).
> For fast unsalted hashes, good speeds may only be achieved for up to a
> few thousand hashes loaded for cracking.  This is a lot worse than with
> GPUs, which handle millions.  So focusing on PBKDF2 makes more sense.

I don't disagree about the priority, though I should point out that
there are also use cases for which only one hash, or a few hashes, are
the target.

> We didn't come up with a good enough idea for a generic password hashing
> soft CPU yet.  My current thinking is that, to avoid bumping into BRAM
> port count for the register file as we would with instructions doing
> little work each, maybe we should have different bitstreams for
> different crypto primitives like MD5, SHA-1, etc. (one at a time) and
> have those available through very high latency instructions in the soft
> CPU to allow for full pipelining - thus, 64 cycles latency for MD5, etc.
> We'd also have a handful of simpler instructions (same or similar in the
> different bitstreams) for implementing higher-level crypto schemes
> around the current bitstream's crypto primitive (this way, the same
> bitstream will be usable for multiple higher-level schemes sharing the
> same crypto primitive).  These would include data copying and control
> transfer instructions.  A tough question is how to combine the extreme
> high-latency crypto instructions with control flow transfer - do we have
> like 63 delay slots?  SPARC has 1, some DSPs have a few, but I've never
> heard of an ISA having tens of delay slots.  Yet maybe this is the way
> to go.
> Meanwhile, or alternatively, maybe we need PBKDF2-SHA* bitstreams.
> There are many JtR formats that use PBKDF2, so it would have been a
> primary candidate for implementation on the soft CPU anyway.
> For NTLM, we could use a soft CPU having an MD4 primitive, but then do
> we have anything else needing MD4?  Perhaps just raw-MD4?  That's very
> rare, and other MD4-based things are probably even more rare.  So
> perhaps a separate bitstream for NTLM as well, or maybe one usable for
> NTLM and for raw-MD4 (different placement of characters into the current
> block in on-FPGA mask mode; the rest of the difference can probably be
> handled on host).
> LM will need to be its own bitstream, although it could be a revision of
> the descrypt design.  Denis probably has specific thoughts on it.
> Technically, we could share a bitstream between descrypt and LM, as
> that's basically different IV (0 vs. non-0), iterations (25 vs. 1), and
> salt size (12 vs. 0 bits, but we can simply set the 12 bits to all 0's),
> but this would be suboptimal.
> Overall, most JtR formats (perhaps 90%+, with exception for scrypt and
> the like) could be reasonably implemented for ZTEX, but a speedup over
> GPU is expected for only a few (bcrypt, maybe Lotus/Domino), the
> required effort is substantial, and there's almost no demand.

Fair points, and informative exploration of the potential. Thanks!


Powered by blists - more mailing lists

Your e-mail address:

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.