Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Thu, 14 Jun 2012 14:49:22 +0200
From: Tavis Ormandy <taviso@...xchg8b.com>
To: john-dev@...ts.openwall.com
Subject: notes on sharding the incremental search space

Hey, just FYI, I recently implemented parallelisation for john on an
unusual cluster architecture, and had to solve one of the problems
listed as open on the wiki. I don't know if my solution is general
enough to be helpful to anyone else, but I think my notes might at least
be useful to someone else trying to solve a similar problem in future!

Solar describes the problem here: http://www.openwall.com/lists/john-users/2005/11/21/2

The problem is how to shard the incremental search space for k-cores
optimally. Most existing parallel implementations appear to shard the
space on order entry boundaries, this is easy to do as they're natural
boundaries in the incremental algorithm. Unfortunately, as solar
explains, these are uneven and can grow very large.

The parallel external filter distributes the work between nodes
perfectly, but scales very poorly. The problem is that each node still
iterates through the search space but just discards them before the
crypt(), so there's significant work duplicated per core, making it
infeasible for more than a dozen or so cores.

Ideally, I wanted a way to split the entire space into an arbitrary number
of equal shards. For example, it's natural to want to split the seach
space over 8 cores like this:

    [  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  ]

But this defeats the purpose of using John's optimised search order,
because this happens:

    High Quality -------- Low Quality --------- Bad Quality
    [  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  ]

There are cores working on low quality candidates, while just a handful
are working on good quality pieces of the search space.

For example, a good quality guess might be "catbob12", you want this
password tested early. A bad quality guess might be "$"%%^FwA)", which
you dont want to waste any cycles testing until all the better
candidates have been exhausted.

I would like to split it like this instead, into thousands of small
chunks (many times the number of cores available):

    High Quality -------- Low Quality --------- Bad Quality
    [0|1|2|3|4| ............................... | 9999999 ]

This way each core works on the highest quality chunk available, before
moving on to the next. This can't be done on order boundaries, because
there are only ~3k of them, and even if there were more, they're so
uneven that it wouldn't make sense.

To solve this problem, I needed a way to calculate the number of crypt()
operations required to complete one order entry, and because one order
might span multiple boundaries, the state at an arbitrary point within
each order.

The core part of the incremental algorithm is inc_key_loop(), here is a
minimal version that doesn't do any work, just to simplify the problem:

// A simplified version of the inc_key_loop() algorithm from John the Ripper
static uint64_t inc_key_loop(uint32_t length, uint32_t fixed, uint32_t count)
{
    int32_t numbers_cache;
    int32_t pos = 0;
    uint64_t crypts = 0;
    uint8_t numbers[8] = {0};

    numbers[fixed]  = count;

update_ending:

    numbers_cache   = numbers[length];

update_last:

    // A crypt() operation would happen here.
    crypts++;

    pos = length;

    if (fixed < length) {
        if (++numbers_cache <= count) {
            if (length >= 2) goto update_last;
            numbers[length] = numbers_cache;
            goto update_ending;
        }
        numbers[pos--] = 0;
        while (pos > fixed) {
            if (++numbers[pos] <= count) goto update_ending;
            numbers[pos--] = 0;
        }
    }
    while (pos-- > 0) {
        if (++numbers[pos] < count) goto update_ending;
        numbers[pos] = 0;
    }

    return crypts;
}

A complete incremental search is going to require CharsetLen^NumChars
crypts(), so there are three problems to solve in order to split it into
an arbitrary number of shards:

* How do you calculate how many crypts() each { length, fixed, count }
  triplet requires, so I know which one contains the next shard
  boundary?
* What will the state be in inc_key_loop() after n crypts (where 0 < n <
  total), so that I can skip to the correct position within each entry?
* And once I can calculate the position in the search I want this node
  to search from, how do I use it?

Someone asked a similar question here, but I think they abandoned the idea:

http://www.openwall.com/lists/john-users/2011/06/29/3

This sounds like what I wanted. So, the base complexity of an order
triplet is pretty simple, it will take pow(count + 1, length) crypts()
to complete. This works when fixed=0, but any other value really
complicates the calculation. Here is my solution:

// Given a { length, fixed, count } triplet, calculate the maximum amount of
// work expected in inc_key_loop().
static uint64_t calculate_maximum_attempts(uint32_t length,
                                           uint32_t fixed,
                                           uint32_t count)
{
    int64_t result;
    int32_t pos;

    // A table of coefficients required to compensate for fixed characters in
    // JtR character files. These were found using Mathematica integer sequence
    // analsysis.
    static const int64_t kOrderCoefficients[][8] = {
        { 0,   0,   0,   0,   0,   0,   0,   0 },
        { 0,  -1,   0,   0,   0,   0,   0,   0 },
        { 0,  -2,   1,   0,   0,   0,   0,   0 },
        { 0,  -3,   3,  -1,   0,   0,   0,   0 },
        { 0,  -4,   6,  -4,   1,   0,   0,   0 },
        { 0,  -5,  10, -10,   5,  -1,   0,   0 },
        { 0,  -6,  15, -20,  15,  -6,   1,   0 },
        { 0,  -7,  21, -35,  35, -21,   7,  -1 },
    };

    // Calculate base complexity of this rule.
    result = pow(count + 1, length);

    // Compensate for fixed characters.
    for (pos = 0; pos < kMaxLength; pos++) {
        result += kOrderCoefficients[fixed][pos] * pow(count + 1, length - pos);
    }

    return result;
}

The core to allow predicting the inc_key_loop() is to understand that
numbers[] is effectively a k-digit base-n representation of the number
of crypts executed so far (approximately). Again, fixed characters
complicate this.

The fixed digits can be solved by taking the other digits that would be
displaced by the fixed position, then translating them into one base
lower than the rest of the digits (!!). Here is my solution:

// Given a { length, fixed, count } triplet, calculate the internal state of
// inc_key_loop() after the specified number of crypt operations.
static bool calculate_number_state(uint8_t  *numbers,
                                   uint32_t  length,
                                   uint32_t  fixed,
                                   uint32_t  count,
                                   uint64_t  crypts)
{
    // Compensate for the number_cache in inc_key_loop() skipping certain
    // increments of numbers[].
    if (fixed != length)
        crypts = crypts - (crypts % (count + 1));

    // Convert number of crypts into a k-digit base-n representation
    // of crypts. This closely matches how John stores it's internal state,
    // with the exception of fixed digits.
    convert_to_base(numbers, kMaxLength, count + 1, crypts);

    // Move the digits above the fixed digit down one position, because the
    // fixed digit will displace them. This is the same as partial
    // multiplication of these digits by base, but moving them is more
    // efficient than having to convert them to integers.
    memmove(&numbers[0], &numbers[1], fixed);

    // Now we can force the fixed digit into the correct position.
    numbers[fixed] = count;

    // Finally, we need to adjust the digits above the fixed position
    // independently. This is because John will never set a digit in those
    // positions to count, however other arithmetic rules still apply.
    //
    // We can interpret this algorithm behaviour as setting these digits to one
    // base lower than the rest of the digits. Yes, this is confusing.
    convert_to_base(numbers,
                    fixed,
                    count,
                    convert_from_base(numbers, fixed, count + 1));

    // Complete.
    return true;
}

With these routines you can calculate John's internal state after an
arbitrary number of crypts, so to skip to the 81365th shard, you do
this (in pseudocode):

total_search_space = CharsetLen^NumChars;
total_number_shards = 9999999;
this_shard_start = 81365 * (total_search_space / total_number_shards);

Now skip through order entries until you find the one that would pass
the this_shard_start boundary, and then call calculate_number_state() on
crypts_expected_after_this_entry - crypts_from_previous_entry_until_boundary.

Once you know the entry number and the numbers[] state, I found it
simple to generate a john.rec file for each node, and just -restore it,
immediately jumping to the correct position. I used a format like this:

  static const char kJohnSessionFormat[] =
    "REC3\n"                                   // Magic
    "4\n"                                      // Number of Arguments
    "--incremental=Whatever\n"                 // Incremental Mode
    "--external=Whatever\n"                    // External Filter
    "%s\n"                                     // Password File
    "0\n"                                      // Time         (unused)
    "0\n"                                      // Count        (unused)
    "00000000\n"                               // CryptNum Lo  (unused)
    "00000000\n"                               // CryptNum Hi  (unused)
    "0\n"                                      // Pass         (unused)
    "-1\n"                                     // Progress     (unused)
    "%08x\n"                                   // Check        (CHR Checksum)
    "%u\n"                                     // Entry        (Incremental)
    "0\n"                                      // Compat       (unused)
    "8\n"                                      // Length       (MaxPasswordLen)
    "%u\n%u\n%u\n%u\n%u\n%u\n%u\n%u\n";        // Numbers      (State)

And you can stop after a workunit has completed like this:

[List.External:Whatever]
int i;
void init () { i = 0; }
void filter () { abort = i++ > 1234454; }

I've attached a .c file demonstrating this, it expects all.chr on stdin
and then tries to skip to the final 123456th candidate of each order. I
used similar code to generate the next REC file on each node, then
collect the results when it's done (I know the FAQ says not to rely on
the REC file format, but presumably if you're reading this you're not
afraid of reading the code) :-)

If exposing this via a supported interface (e.g. the config language, or
a command line option) has a chance of making the official distribution,
let me know and I would be happy to write patch.

Hope this helps someone in future.

Tavis.

p.s. I also have a sha-1 implementation that's a little faster than the
jumbo version, would this be the right list to send that to? Is there a
jumbo cvs repo I can checkout to patch against?

-- 
-------------------------------------
taviso@...xchg8b.com | pgp encrypted mail preferred
-------------------------------------------------------

#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <stdbool.h>
#include <stdlib.h>
#include <math.h>
#include <assert.h>

// Test code to demonstrate traversal of the incremental search space in John the Ripper.
//
// Author: taviso@...xchg8b.com, Jun 2012.

struct header {
    uint32_t    version;
    uint32_t    check[6];
    uint8_t     min;
    uint8_t     max;
    uint8_t     length;
    uint8_t     count;
    uint32_t    offsets[8];
    struct {
        uint8_t length;
        uint8_t fixed;
        uint8_t count;
    } order[3420];
} __attribute__((packed));

static const uint32_t kMinLength     = 8;
static const uint32_t kMaxLength     = 8;
static const uint32_t kCharsetLen    = 95;
static const uint32_t kTotalShards   = 1 << 21;

// static const uint64_t kTotalKeySpace = pow(kCharsetLen, kMaxLength);
// $ bc <<< 'obase=16;95^8'
// 1791C60F6FED01
static const uint64_t kTotalKeySpace = 0x1791C60F6FED01ULL;

// $ bc <<< 'obase=16;(95^8)/(2^21)'
// BC8E307B
static const uint64_t kWorkUnit      = 0xBC8E307BULL;

// Convert an array of numdigits integers of a specified base into an integer.
static uint64_t convert_from_base(uint8_t *digits,
                                  uint8_t  numdigits,
                                  uint8_t  base)
{
    uint64_t result;
    uint32_t i;

    for (result = i = 0; numdigits--; i++) {
        result += digits[numdigits] * pow(base, i);
    }

    return result;
}

// Convert an integer into an array of digits.
static bool convert_to_base(uint8_t *digits,
                            uint8_t  numdigits,
                            uint8_t  base,
                            uint64_t value)
{
    int32_t i;

    // Not possible to represent numbers in an unsigned integer base less than
    // one, return failure.
    if (base == 0) return false;

    // Distribute value digits appropriately.
    for (i = numdigits - 1; i >= 0; i--) {
        digits[i] = value % base;
        value     = value / base;
    }

    return true;
}

// Given a { length, fixed, count } triplet, calculate the maximum amount of
// work expected in inc_key_loop().
static uint64_t calculate_maximum_attempts(uint32_t length,
                                           uint32_t fixed,
                                           uint32_t count)
{
    int64_t result;
    int32_t pos;

    // A table of coefficients required to compensate for fixed characters in
    // JtR character files.
    static const int64_t kOrderCoefficients[][8] = {
        { 0,   0,   0,   0,   0,   0,   0,   0 },
        { 0,  -1,   0,   0,   0,   0,   0,   0 },
        { 0,  -2,   1,   0,   0,   0,   0,   0 },
        { 0,  -3,   3,  -1,   0,   0,   0,   0 },
        { 0,  -4,   6,  -4,   1,   0,   0,   0 },
        { 0,  -5,  10, -10,   5,  -1,   0,   0 },
        { 0,  -6,  15, -20,  15,  -6,   1,   0 },
        { 0,  -7,  21, -35,  35, -21,   7,  -1 },
    };

    // Calculate base complexity of this rule.
    result = pow(count + 1, length);

    // Compensate for fixed characters.
    for (pos = 0; pos < kMaxLength; pos++) {
        result += kOrderCoefficients[fixed][pos] * pow(count + 1, length - pos);
    }

    return result;
}

// Given a { length, fixed, count } triplet, calculate the internal state of
// inc_key_loop() after the specified number of crypt operations.
//
// Note that the state produced may be a few crypts early due to the
// number_cache behaviour. This is unavoidable, john does the same thing with
// REC files internally.
static bool calculate_number_state(uint8_t  *numbers,
                                   uint32_t  length,
                                   uint32_t  fixed,
                                   uint32_t  count,
                                   uint64_t  crypts)
{
    // Compensate for the number_cache in inc_key_loop() skipping certain
    // increments of numbers[].
    if (fixed != length)
        crypts = crypts - (crypts % (count + 1));

    // Convert number of crypts into a k-digit base-n representation
    // of crypts. This closely matches how John stores it's internal state,
    // with the exception of fixed digits.
    convert_to_base(numbers, kMaxLength, count + 1, crypts);

    // Move the digits above the fixed digit down one position, because the
    // fixed digit will displace them. This is the same as partial
    // multiplication of these digits by base, but moving them is more
    // efficient than having to convert them to integers.
    memmove(&numbers[0], &numbers[1], fixed);

    // Now we can force the fixed digit into the correct position.
    numbers[fixed] = count;

    // Finally, we need to adjust the digits above the fixed position
    // independently. This is because John will never set a digit in those
    // positions to count, however other arithmetic rules still apply.
    //
    // We can interpret this algorithm behaviour as setting these digits to one
    // base lower than the rest of the digits. Yes, this is confusing.
    convert_to_base(numbers,
                    fixed,
                    count,
                    convert_from_base(numbers, fixed, count + 1));

    // Complete.
    return true;
}

// A simplified version of the inc_key_loop() algorithm from John the Ripper
// used for verifying prediction results.
static uint64_t inc_key_loop(uint8_t  *numbers,
                             uint32_t  length,
                             uint32_t  fixed,
                             uint32_t  count)
{
    int32_t numbers_cache;
    int32_t pos = 0;
    uint64_t crypts = 0;

    numbers[fixed]  = count;

update_ending:

    numbers_cache   = numbers[length];

update_last:

    // A crypt() operation would happen here.
    crypts++;

    pos = length;

    if (fixed < length) {
        if (++numbers_cache <= count) {
            if (length >= 2) goto update_last;
            numbers[length] = numbers_cache;
            goto update_ending;
        }
        numbers[pos--] = 0;
        while (pos > fixed) {
            if (++numbers[pos] <= count) goto update_ending;
            numbers[pos--] = 0;
        }
    }
    while (pos-- > 0) {
        if (++numbers[pos] < count) goto update_ending;
        numbers[pos] = 0;
    }

    return crypts;
}


int main(int argc, char **argv)
{
    struct header header;   // Header of CHR file.
    uint64_t crypts;        // Total number of crypt() operations expected.
    uint64_t estimated;     // Estimated number of crypts() required to complete this order entry.
    uint32_t entry;         // Current order index (a table stored in the CHR files).
    uint64_t result;
    uint32_t i;

    // There are two important components of JtR internal state for incremental
    // cracking. First, the entry number (essentially an index into the order
    // table of the CHR files). Second is the numbers array, for which the bulk
    // of the algorithm is in inc_key_loop() in John's inc.c.
    uint8_t john_numbers_state[kMaxLength];

    // Read in CHR file from stdin.
    fread(&header, sizeof header, 1, stdin);

    fprintf(stderr, "Version:   %08X\n", header.version);
    fprintf(stderr, "Check:     %08X\n", header.check[0]);
    fprintf(stderr, "Min:       %u\n", header.min);
    fprintf(stderr, "Max:       %u\n", header.max);
    fprintf(stderr, "Length:    %u\n", header.length);
    fprintf(stderr, "Count:     %u\n", header.count);
    fprintf(stderr, "Offsets (ignored):\n");
    fprintf(stderr, "\t%08X %08X %08X %08X %08X\n\t...\n",
            header.offsets[0],
            header.offsets[1],
            header.offsets[2],
            header.offsets[3],
            header.offsets[4]);

    for (entry = 0, crypts = 0;
         entry < (sizeof header.order / sizeof header.order[0]);
         entry++) {

        // This logic duplicated from do_incremental_crack()
        if (header.order[entry].fixed && !header.order[entry].count)
            continue;
        if (header.order[entry].length + 1 < kMinLength)
            continue;
        if (header.order[entry].length >= kMaxLength)
            continue;
        if (header.order[entry].count >= kCharsetLen)
            continue;

        // Calculate worst case requirements for crypt() operations this entry
        // requires. We still need to adjust for 'Fixed' characters, which is
        // done with the table below.
        estimated = calculate_maximum_attempts(header.order[entry].length,
                                               header.order[entry].fixed,
                                               header.order[entry].count);



        if (estimated < 12345678)
            continue;

       crypts = estimated - 12345678;

        // Find the state after num crypt operations, so we can skip to an
        // arbitrary position.
        if (calculate_number_state(john_numbers_state,
                                   header.order[entry].length,
                                   header.order[entry].fixed,
                                   header.order[entry].count,
                                   crypts) == false) {
            fprintf(stderr, "error: calculate_number_state() returned failure\n");
            return 1;
        }

        fprintf(stderr, "\tentry %u { %u, %u, %u }, ~%llu crypt()\n",
                        entry,
                        header.order[entry].length,
                        header.order[entry].fixed,
                        header.order[entry].count,
                        estimated);

        result = inc_key_loop(john_numbers_state,
                              header.order[entry].length,
                              header.order[entry].fixed,
                              header.order[entry].count);

        fprintf(stderr, "crypts executed %llu\n", result);
    }

    return 0;
}


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ