john-dev - Re: USB-FPGA development

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39247887.7504009.1470067004590.JavaMail.yahoo@mail.yahoo.com>
Date: Mon, 1 Aug 2016 15:56:44 +0000 (UTC)
From:  <apingis@...nwall.net>
To: "john-dev@...ts.openwall.com" <john-dev@...ts.openwall.com>
Subject: Re: USB-FPGA development

Hi,

FPGA side application for descrypt is ready. Here are details:

1. Communication framework improvement.
URL: https://github.com/Apingis/ztex_inouttraffic
The purpose of the improvement is to create API independent from hardware implementation details (such as how fpga's switched on Ztex board or USB details).
Another issue is that host side and fpga side would exchange sequential packets of application data and framework provides functions for that.

2. Word Generator.
- Generates a word every cycle
- Implemented as a parametrized Verilog module (char_bits=7,ranges_max=8)
- It allows generation base on supplied word list. However it doesn't allow to insert same word into more than 1 position and I think such ability wouldn't be useful for descrypt anyway.
- Allows to specify starting index and number of candidates to generate, for easy distribution of load among multiple fpgas and boards
- C header: https://github.com/Apingis/ztex_inouttraffic/blob/master/host/pkt_comm/word_gen.h
- Generator and the rest of framework use less than 2% of fpga's resources.

3. crypt(3) Standard DES password cracker for Ztex 1.15y FPGA board.
URL: https://github.com/Apingis/ztex-descrypt
Project is based on communication framework.
Features:
- After generation, candidates are transferred to "arbiter" unit. Arbiter's task is to distribute candidates among cores and gather results.
- The design is split into cores. Each core includes 16-stage crypt pipeline and 1 bsearch comparator. At same time 16 candidates are on crypt pipeline and other 16 are in the comparator.
- Results are output in 2 types of packets: "Comparator found equality" and "Processing of an input packet done".
- Current version includes a built bitstream with 24 cores that operate at 216 MHz. Comparators have up to 1023 entries in hash table, operate at 156 MHz. That occupies 57% of fpga's resources.
- That performs at 700 MH/s with 511 or less hash table entries. If more hash table entries used, comparsion stage becomes a bottleneck and performance decreases by approximately 10%.
- The architecture has reached an internal limit so addition of more cores doesn't improve performance. That's because crypt takes 400 cycles to complete and arbiter processes one candidate every cycle. Word generator also generates one candidate every cycle. So in theory at 25 cores X 16 candidates the system reaches its architecture limit, actual limit is a little less than that.


Now I do concentrate on JtR integration.
The most major issue is the usage of candidate generator. There's nothing on the issue in 1.8.0-Jumbo-1.
I see mask mode development in bleeding-jumbo. "Mask" defined in mask.h and on-board generator configuration are about the same and do same function. I've got a question:
Mask mode doesn't use "format" API. However everything processed in mask mode must utilize all software features - such as crash recovery, distribution by nodes, collection of stats etc, including features that are not implemented yet. That definitely would result in code duplication which in turn would require more effort for further maintenence and development. How do you think?
There was a proposal by Solar to use "format" API for mask mode: http://www.openwall.com/lists/john-dev/2012/04/30/4
If that proposal was attempted and rejected - it would be interesting to know of the reasons, were there any impossible to resolve issues?

Denis
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.