
Date: Sat, 25 Jun 2011 03:48:46 +0400 From: Solar Designer <solar@...nwall.com> To: johnusers@...ts.openwall.com Subject: Re: John the Ripper 1.7.8: DES speedup On Thu, Jun 23, 2011 at 09:59:27PM +0100, Larry Bonner wrote: > For anyone interested, Brian published his algorithm on the following > page: http://gladman.plushost.co.uk/oldsite/cryptography_technology/serpent/index.php Thanks! On Thu, Jun 23, 2011 at 8:14 AM, Simon Marechal <simon@...quise.net> wrote: > This sound pretty interesting. Is there a writeup coming I'm not sure. We have no specific plans on this. > or some description of the approach ? I'll try to explain it briefly: Roman's primary idea was to start with breadthfirst search  that is, find Boolean functions for each possible truth table. Doing it for the full 6 inputs is impractical (we'd need 2**64 memory), but for 5 inputs it is practical. So we allocate a 2**32 entry array, indexed by truth table of 5to1 functions. Each filled entry holds the number of logic operations needed to reach that truth table from a certain initial state (typically from the 5 inputs supplied asis). Since our actual Sboxes are 6to4 rather than 5to1, this is followed by depthfirst search to choose 8 1bit outputs (and their corresponding 32bit truth tables, considering function complexities) that can be combined to produce our desired 4 output bits (with predefined target 64bit truth tables), with one of the 6 input bits being the selector bit. There's also the task of choosing which bit to use as the selector and which two operations (OR and XOR, AND and XOR, or ANDNOT and XOR) to use to apply it to a pair of 5to1 outputs (the selector bit is chosen for all 4 outputs at once, but the two operations may be chosen differently per output). Luckily, that's a small enough number of alternatives to try. Additionally, in some cases Roman chose to provide other than 5 direct inputs of the Sbox as the initial state to measure function complexities against. Finally, there's the task of reconstructing the chosen 5to1 functions and generating the proper expressions with intermediate results reuse (for the 8 partial outputs). Much of this requires a lot of manual tuning and retuning per Sbox. I'm sure I have missed many details, some of them for simplicity and some because Roman and not I was the one to work on this. (So I don't recall or even don't know some of the details, even though I do have a copy of the program code.) As I had mentioned, my involvement was code generation and feedback to Roman to make sure we're able to generate decent code. Roman was running my Perl scripts to estimate the number of registers and 2op instructions needed for different Sbox expressions. Then I also ran them, and improved them further, to choose the most suitable ones of the remaining Sbox expressions (thousands of them). My final revisions also estimated available parallelism, separately for 3op and 2op architectures. You can see some selection criteria in somewhat cryptic comments in the Sbox files in JtR 1.7.8, like this: /* s8019630, 41 gates, 14 regs, 11 andn, 4/21/60/101/143 stalls, 62 biop */ Not surprisingly, the goal to require fewer registers sometimes conflicted with the desire to have more parallelism. There were other conflicting criteria as well. Thus, the final Sbox files in 1.7.8 have several versions of almost every Sbox, with #if's. I also suggested a way to halve the big array (to 2**31 entries), but we never implemented that. This would be desirable if we decided to get a community involved in searching for even more optimal Sbox expressions  such that people with 32bit systems could participate too. (As currently implemented, Roman's program requires a 64bit system with at least 5 GB RAM, and this might not always be enough.) Alexander
Powered by blists  more mailing lists