
Date: Thu, 14 Jun 2012 14:49:22 +0200 From: Tavis Ormandy <taviso@...xchg8b.com> To: johndev@...ts.openwall.com Subject: notes on sharding the incremental search space Hey, just FYI, I recently implemented parallelisation for john on an unusual cluster architecture, and had to solve one of the problems listed as open on the wiki. I don't know if my solution is general enough to be helpful to anyone else, but I think my notes might at least be useful to someone else trying to solve a similar problem in future! Solar describes the problem here: http://www.openwall.com/lists/johnusers/2005/11/21/2 The problem is how to shard the incremental search space for kcores optimally. Most existing parallel implementations appear to shard the space on order entry boundaries, this is easy to do as they're natural boundaries in the incremental algorithm. Unfortunately, as solar explains, these are uneven and can grow very large. The parallel external filter distributes the work between nodes perfectly, but scales very poorly. The problem is that each node still iterates through the search space but just discards them before the crypt(), so there's significant work duplicated per core, making it infeasible for more than a dozen or so cores. Ideally, I wanted a way to split the entire space into an arbitrary number of equal shards. For example, it's natural to want to split the seach space over 8 cores like this: [ 0  1  2  3  4  5  6  7  8 ] But this defeats the purpose of using John's optimised search order, because this happens: High Quality  Low Quality  Bad Quality [ 0  1  2  3  4  5  6  7  8 ] There are cores working on low quality candidates, while just a handful are working on good quality pieces of the search space. For example, a good quality guess might be "catbob12", you want this password tested early. A bad quality guess might be "$"%%^FwA)", which you dont want to waste any cycles testing until all the better candidates have been exhausted. I would like to split it like this instead, into thousands of small chunks (many times the number of cores available): High Quality  Low Quality  Bad Quality [01234 ...............................  9999999 ] This way each core works on the highest quality chunk available, before moving on to the next. This can't be done on order boundaries, because there are only ~3k of them, and even if there were more, they're so uneven that it wouldn't make sense. To solve this problem, I needed a way to calculate the number of crypt() operations required to complete one order entry, and because one order might span multiple boundaries, the state at an arbitrary point within each order. The core part of the incremental algorithm is inc_key_loop(), here is a minimal version that doesn't do any work, just to simplify the problem: // A simplified version of the inc_key_loop() algorithm from John the Ripper static uint64_t inc_key_loop(uint32_t length, uint32_t fixed, uint32_t count) { int32_t numbers_cache; int32_t pos = 0; uint64_t crypts = 0; uint8_t numbers[8] = {0}; numbers[fixed] = count; update_ending: numbers_cache = numbers[length]; update_last: // A crypt() operation would happen here. crypts++; pos = length; if (fixed < length) { if (++numbers_cache <= count) { if (length >= 2) goto update_last; numbers[length] = numbers_cache; goto update_ending; } numbers[pos] = 0; while (pos > fixed) { if (++numbers[pos] <= count) goto update_ending; numbers[pos] = 0; } } while (pos > 0) { if (++numbers[pos] < count) goto update_ending; numbers[pos] = 0; } return crypts; } A complete incremental search is going to require CharsetLen^NumChars crypts(), so there are three problems to solve in order to split it into an arbitrary number of shards: * How do you calculate how many crypts() each { length, fixed, count } triplet requires, so I know which one contains the next shard boundary? * What will the state be in inc_key_loop() after n crypts (where 0 < n < total), so that I can skip to the correct position within each entry? * And once I can calculate the position in the search I want this node to search from, how do I use it? Someone asked a similar question here, but I think they abandoned the idea: http://www.openwall.com/lists/johnusers/2011/06/29/3 This sounds like what I wanted. So, the base complexity of an order triplet is pretty simple, it will take pow(count + 1, length) crypts() to complete. This works when fixed=0, but any other value really complicates the calculation. Here is my solution: // Given a { length, fixed, count } triplet, calculate the maximum amount of // work expected in inc_key_loop(). static uint64_t calculate_maximum_attempts(uint32_t length, uint32_t fixed, uint32_t count) { int64_t result; int32_t pos; // A table of coefficients required to compensate for fixed characters in // JtR character files. These were found using Mathematica integer sequence // analsysis. static const int64_t kOrderCoefficients[][8] = { { 0, 0, 0, 0, 0, 0, 0, 0 }, { 0, 1, 0, 0, 0, 0, 0, 0 }, { 0, 2, 1, 0, 0, 0, 0, 0 }, { 0, 3, 3, 1, 0, 0, 0, 0 }, { 0, 4, 6, 4, 1, 0, 0, 0 }, { 0, 5, 10, 10, 5, 1, 0, 0 }, { 0, 6, 15, 20, 15, 6, 1, 0 }, { 0, 7, 21, 35, 35, 21, 7, 1 }, }; // Calculate base complexity of this rule. result = pow(count + 1, length); // Compensate for fixed characters. for (pos = 0; pos < kMaxLength; pos++) { result += kOrderCoefficients[fixed][pos] * pow(count + 1, length  pos); } return result; } The core to allow predicting the inc_key_loop() is to understand that numbers[] is effectively a kdigit basen representation of the number of crypts executed so far (approximately). Again, fixed characters complicate this. The fixed digits can be solved by taking the other digits that would be displaced by the fixed position, then translating them into one base lower than the rest of the digits (!!). Here is my solution: // Given a { length, fixed, count } triplet, calculate the internal state of // inc_key_loop() after the specified number of crypt operations. static bool calculate_number_state(uint8_t *numbers, uint32_t length, uint32_t fixed, uint32_t count, uint64_t crypts) { // Compensate for the number_cache in inc_key_loop() skipping certain // increments of numbers[]. if (fixed != length) crypts = crypts  (crypts % (count + 1)); // Convert number of crypts into a kdigit basen representation // of crypts. This closely matches how John stores it's internal state, // with the exception of fixed digits. convert_to_base(numbers, kMaxLength, count + 1, crypts); // Move the digits above the fixed digit down one position, because the // fixed digit will displace them. This is the same as partial // multiplication of these digits by base, but moving them is more // efficient than having to convert them to integers. memmove(&numbers[0], &numbers[1], fixed); // Now we can force the fixed digit into the correct position. numbers[fixed] = count; // Finally, we need to adjust the digits above the fixed position // independently. This is because John will never set a digit in those // positions to count, however other arithmetic rules still apply. // // We can interpret this algorithm behaviour as setting these digits to one // base lower than the rest of the digits. Yes, this is confusing. convert_to_base(numbers, fixed, count, convert_from_base(numbers, fixed, count + 1)); // Complete. return true; } With these routines you can calculate John's internal state after an arbitrary number of crypts, so to skip to the 81365th shard, you do this (in pseudocode): total_search_space = CharsetLen^NumChars; total_number_shards = 9999999; this_shard_start = 81365 * (total_search_space / total_number_shards); Now skip through order entries until you find the one that would pass the this_shard_start boundary, and then call calculate_number_state() on crypts_expected_after_this_entry  crypts_from_previous_entry_until_boundary. Once you know the entry number and the numbers[] state, I found it simple to generate a john.rec file for each node, and just restore it, immediately jumping to the correct position. I used a format like this: static const char kJohnSessionFormat[] = "REC3\n" // Magic "4\n" // Number of Arguments "incremental=Whatever\n" // Incremental Mode "external=Whatever\n" // External Filter "%s\n" // Password File "0\n" // Time (unused) "0\n" // Count (unused) "00000000\n" // CryptNum Lo (unused) "00000000\n" // CryptNum Hi (unused) "0\n" // Pass (unused) "1\n" // Progress (unused) "%08x\n" // Check (CHR Checksum) "%u\n" // Entry (Incremental) "0\n" // Compat (unused) "8\n" // Length (MaxPasswordLen) "%u\n%u\n%u\n%u\n%u\n%u\n%u\n%u\n"; // Numbers (State) And you can stop after a workunit has completed like this: [List.External:Whatever] int i; void init () { i = 0; } void filter () { abort = i++ > 1234454; } I've attached a .c file demonstrating this, it expects all.chr on stdin and then tries to skip to the final 123456th candidate of each order. I used similar code to generate the next REC file on each node, then collect the results when it's done (I know the FAQ says not to rely on the REC file format, but presumably if you're reading this you're not afraid of reading the code) :) If exposing this via a supported interface (e.g. the config language, or a command line option) has a chance of making the official distribution, let me know and I would be happy to write patch. Hope this helps someone in future. Tavis. p.s. I also have a sha1 implementation that's a little faster than the jumbo version, would this be the right list to send that to? Is there a jumbo cvs repo I can checkout to patch against?   taviso@...xchg8b.com  pgp encrypted mail preferred  #include <stdio.h> #include <stdint.h> #include <string.h> #include <stdbool.h> #include <stdlib.h> #include <math.h> #include <assert.h> // Test code to demonstrate traversal of the incremental search space in John the Ripper. // // Author: taviso@...xchg8b.com, Jun 2012. struct header { uint32_t version; uint32_t check[6]; uint8_t min; uint8_t max; uint8_t length; uint8_t count; uint32_t offsets[8]; struct { uint8_t length; uint8_t fixed; uint8_t count; } order[3420]; } __attribute__((packed)); static const uint32_t kMinLength = 8; static const uint32_t kMaxLength = 8; static const uint32_t kCharsetLen = 95; static const uint32_t kTotalShards = 1 << 21; // static const uint64_t kTotalKeySpace = pow(kCharsetLen, kMaxLength); // $ bc <<< 'obase=16;95^8' // 1791C60F6FED01 static const uint64_t kTotalKeySpace = 0x1791C60F6FED01ULL; // $ bc <<< 'obase=16;(95^8)/(2^21)' // BC8E307B static const uint64_t kWorkUnit = 0xBC8E307BULL; // Convert an array of numdigits integers of a specified base into an integer. static uint64_t convert_from_base(uint8_t *digits, uint8_t numdigits, uint8_t base) { uint64_t result; uint32_t i; for (result = i = 0; numdigits; i++) { result += digits[numdigits] * pow(base, i); } return result; } // Convert an integer into an array of digits. static bool convert_to_base(uint8_t *digits, uint8_t numdigits, uint8_t base, uint64_t value) { int32_t i; // Not possible to represent numbers in an unsigned integer base less than // one, return failure. if (base == 0) return false; // Distribute value digits appropriately. for (i = numdigits  1; i >= 0; i) { digits[i] = value % base; value = value / base; } return true; } // Given a { length, fixed, count } triplet, calculate the maximum amount of // work expected in inc_key_loop(). static uint64_t calculate_maximum_attempts(uint32_t length, uint32_t fixed, uint32_t count) { int64_t result; int32_t pos; // A table of coefficients required to compensate for fixed characters in // JtR character files. static const int64_t kOrderCoefficients[][8] = { { 0, 0, 0, 0, 0, 0, 0, 0 }, { 0, 1, 0, 0, 0, 0, 0, 0 }, { 0, 2, 1, 0, 0, 0, 0, 0 }, { 0, 3, 3, 1, 0, 0, 0, 0 }, { 0, 4, 6, 4, 1, 0, 0, 0 }, { 0, 5, 10, 10, 5, 1, 0, 0 }, { 0, 6, 15, 20, 15, 6, 1, 0 }, { 0, 7, 21, 35, 35, 21, 7, 1 }, }; // Calculate base complexity of this rule. result = pow(count + 1, length); // Compensate for fixed characters. for (pos = 0; pos < kMaxLength; pos++) { result += kOrderCoefficients[fixed][pos] * pow(count + 1, length  pos); } return result; } // Given a { length, fixed, count } triplet, calculate the internal state of // inc_key_loop() after the specified number of crypt operations. // // Note that the state produced may be a few crypts early due to the // number_cache behaviour. This is unavoidable, john does the same thing with // REC files internally. static bool calculate_number_state(uint8_t *numbers, uint32_t length, uint32_t fixed, uint32_t count, uint64_t crypts) { // Compensate for the number_cache in inc_key_loop() skipping certain // increments of numbers[]. if (fixed != length) crypts = crypts  (crypts % (count + 1)); // Convert number of crypts into a kdigit basen representation // of crypts. This closely matches how John stores it's internal state, // with the exception of fixed digits. convert_to_base(numbers, kMaxLength, count + 1, crypts); // Move the digits above the fixed digit down one position, because the // fixed digit will displace them. This is the same as partial // multiplication of these digits by base, but moving them is more // efficient than having to convert them to integers. memmove(&numbers[0], &numbers[1], fixed); // Now we can force the fixed digit into the correct position. numbers[fixed] = count; // Finally, we need to adjust the digits above the fixed position // independently. This is because John will never set a digit in those // positions to count, however other arithmetic rules still apply. // // We can interpret this algorithm behaviour as setting these digits to one // base lower than the rest of the digits. Yes, this is confusing. convert_to_base(numbers, fixed, count, convert_from_base(numbers, fixed, count + 1)); // Complete. return true; } // A simplified version of the inc_key_loop() algorithm from John the Ripper // used for verifying prediction results. static uint64_t inc_key_loop(uint8_t *numbers, uint32_t length, uint32_t fixed, uint32_t count) { int32_t numbers_cache; int32_t pos = 0; uint64_t crypts = 0; numbers[fixed] = count; update_ending: numbers_cache = numbers[length]; update_last: // A crypt() operation would happen here. crypts++; pos = length; if (fixed < length) { if (++numbers_cache <= count) { if (length >= 2) goto update_last; numbers[length] = numbers_cache; goto update_ending; } numbers[pos] = 0; while (pos > fixed) { if (++numbers[pos] <= count) goto update_ending; numbers[pos] = 0; } } while (pos > 0) { if (++numbers[pos] < count) goto update_ending; numbers[pos] = 0; } return crypts; } int main(int argc, char **argv) { struct header header; // Header of CHR file. uint64_t crypts; // Total number of crypt() operations expected. uint64_t estimated; // Estimated number of crypts() required to complete this order entry. uint32_t entry; // Current order index (a table stored in the CHR files). uint64_t result; uint32_t i; // There are two important components of JtR internal state for incremental // cracking. First, the entry number (essentially an index into the order // table of the CHR files). Second is the numbers array, for which the bulk // of the algorithm is in inc_key_loop() in John's inc.c. uint8_t john_numbers_state[kMaxLength]; // Read in CHR file from stdin. fread(&header, sizeof header, 1, stdin); fprintf(stderr, "Version: %08X\n", header.version); fprintf(stderr, "Check: %08X\n", header.check[0]); fprintf(stderr, "Min: %u\n", header.min); fprintf(stderr, "Max: %u\n", header.max); fprintf(stderr, "Length: %u\n", header.length); fprintf(stderr, "Count: %u\n", header.count); fprintf(stderr, "Offsets (ignored):\n"); fprintf(stderr, "\t%08X %08X %08X %08X %08X\n\t...\n", header.offsets[0], header.offsets[1], header.offsets[2], header.offsets[3], header.offsets[4]); for (entry = 0, crypts = 0; entry < (sizeof header.order / sizeof header.order[0]); entry++) { // This logic duplicated from do_incremental_crack() if (header.order[entry].fixed && !header.order[entry].count) continue; if (header.order[entry].length + 1 < kMinLength) continue; if (header.order[entry].length >= kMaxLength) continue; if (header.order[entry].count >= kCharsetLen) continue; // Calculate worst case requirements for crypt() operations this entry // requires. We still need to adjust for 'Fixed' characters, which is // done with the table below. estimated = calculate_maximum_attempts(header.order[entry].length, header.order[entry].fixed, header.order[entry].count); if (estimated < 12345678) continue; crypts = estimated  12345678; // Find the state after num crypt operations, so we can skip to an // arbitrary position. if (calculate_number_state(john_numbers_state, header.order[entry].length, header.order[entry].fixed, header.order[entry].count, crypts) == false) { fprintf(stderr, "error: calculate_number_state() returned failure\n"); return 1; } fprintf(stderr, "\tentry %u { %u, %u, %u }, ~%llu crypt()\n", entry, header.order[entry].length, header.order[entry].fixed, header.order[entry].count, estimated); result = inc_key_loop(john_numbers_state, header.order[entry].length, header.order[entry].fixed, header.order[entry].count); fprintf(stderr, "crypts executed %llu\n", result); } return 0; }
Powered by blists  more mailing lists