
Date: Thu, 25 Jun 2015 19:59:47 +0300 From: Aleksey Cherepanov <lyosha@...nwall.com> To: johndev@...ts.openwall.com Subject: precomputed attacks for john: rainbow tables and other ways Precomputed attacks An interesting discussion about cracking in general occurred. Results may become a feature of john someday. In john now, we try candidates against hashes. If we get new hashes, we need to try same attacks again trying the same candidates again. For salted hashes, it is rather ok due to salts (it should be rare case to meet another hash with the same salt; except for descrypt that has only 4096 salts). But for hashes without salt, we totally repeat ourself. What if we could hash candidate and store the results for later use? We might precompute an attack even not having hashes. Precomputed attack may be applicable to hashes without salts and with weak salts (like descrypt). Straight forward approach: just remember all pairs candidatehash. We will quickly run out of memory/space because we tries a lot of candidates. Also look up may be slower than usual cracking if we got several hashes. Nevertheless such approach may have its application. Special kind of that: to store only successful cracks, john does it. Rainbow tables Well known example with timespace trade off is rainbow tables: we store less but look up needs more operations. For me, rainbow tables are associated with the following properties:  tables are big  look up is very slow  attack is charset based  rainbow tables are probabilistic: they do not guarantee that hash is not from password of this charset But rainbow tables can be very different! A rainbow with all opposite properties can be made! And I made some. Rainbow tables have various parameters that can be tuned to move balance between look up time and space. Implementations for wordlist based attacks exist. For small attacks, it is possible to guarantee coverage of key space. I made a simple rainbow tables engine to play with parameters. It was broken but Matt Weir gave crucial hint to me and now it works. The implementation is attached (t.py). The attack behind the engine is word combinations. Any attack that has quick function number>candidate can be implemented easily and be efficient. The engine tracks tried candidates and guarantees that check of hash against the table means that hash has password outside of the attack's key space. It is possible to track coverage with 1 bit per candidate so 2^32 candidates need only 512mb of memory (it is only during generation). But tracking limits tables to attacks of such sizes. Bigger attacks has to be split into separate tables, so it is not suitable for huge attacks like 8 ascii characters. But it is possible to make 4 word pass phrases with 256 most popular words. A pentester may want to prepare an attack with words from local dictionary, or topic based. So such precomputed attack may be interesting in practical use. Output from my engine Generation: total count = 16777216, chain_length = 200, color_count = 200 i == 0; l(c) == 0, tried 0 0.000, t/c 0.000, avg 0.000 i == 4096; l(c) == 4003, tried 772016 0.046, t/c 192.859, avg 200.000 [...] i == 102400; l(c) == 69699, tried 8416460 0.502, t/c 120.754, avg 200.000 [...] i == 8388608; l(c) == 725665, tried 16481885 0.982, t/c 22.713, avg 200.000 [...] i == 10854400; l(c) == 805041, tried 16601228 0.990, t/c 20.622, avg 200.000 [...] i == 15695872; l(c) == 931886, tried 16752150 0.999, t/c 17.977, avg 200.000 [...] i == 16773120; l(c) == 956305, tried 16777101 1.000, t/c 17.544, avg 200.000 chains 956420 efficiency 0.057 generated in 1197.97685814 end: count 200, time 185.126296997 s, speed 1.08 c/s brute: time 43.9373371601 s, speed 381844.17 c/s As a test, I generated a table with 3 word pass phrases from set of 256 words, i.e. 16M candidates. Check of 1 hash needs less than a second. Space of the table on disk could be 6 mb (or 8 mb with 4 bytes to store 1 number; we need to store 2 numbers per chain). It is interesting to tune parameters. i == 8388608; l(c) == 725665, tried 16481885 0.982, t/c 22.713, avg 200.000 ^ That's the point when we tried sequentially first 50% of candidates, but chains cover 98% of chains. Remaining 2% need so many chains that we can write down numbers of candidates as is to perform regular crack then (storing candidate means 1 number, while chain needs 2 numbers, so we save space and do not increase cracking time much). That's a trade off available here. Such engine can be implemented in john not changing current format interface. A limit of 2^31 candidates per table occurs: the problem is that we need part of hash, the only method for that is get_hash[](), get_hash[6]() gives 31 bits, it is maximum. So having 2M c/s for rawsha512, 2^31 attack would mean 18 minutes to perform attack as usual. Broken implementation My initial broken implementation is attached too (t_old.py). My original implementation missed the idea of "colors". So it was not real _rainbow_ tables. The difference: rainbow tables uses chains of candidates/hashes, both hash and position in chain are used to produce next candidate, while I used only hash. Using only hash to produce next candidate: any collision means that end of chains are the same, I cut such chains and I get approximately N/2 chains in the end (where N is the size of attack's key space). Most chains were very short, but they are almost unavoidable. Alexander Cherepanov pointed out to me that all candidates consist a graph with 1 arrow from node in case of single color tables. Connections are random and depend onto keyspace, hashing function, and function to map hash into next candidate. Number of chains can't be lower than number of leafs (nodes with 0 incoming connections). Idea 1: I can tweak the function to map hash into next candidate: compute hashes from all candidates in row and remember them, then try various mapping functions to choose the best. Mapping function can be flexible (as Pearson hashing) and we can try hill climbing (or other meta heuristic algorithm) to improve it gradually. Due to randomness of connections, it should not be possible to improve connections much not storing a lot of data. But it may be interesting to research these limits. Idea 2: if we computed all hashes and remember them, then we can just construct a minimal perfect hash function to map hashes back into candidates. We need order preserving mphf. While mphf is not a new topic, our certain task may have better solution than existing. Timememory trade off: we don't really need to have 1 function that maps hash into candidate. We can map 10% of hashes into their candidates using 1 function, another 10% with other function and so on, thus we need to check 10 functions (to perform 10 hashing: given hash > candidate > computed hash > comparison with given hash). We can reduce total space used because building a function picks more comfortable candidates that need less space. An experiment is needed. Conclusions  rainbow tables can be suitable for rather fast checks like 1s per hash,  rainbow tables can work with various attacks including attacks onto pass phrases,  rainbow tables can be implemented in john not changing current format interface (limiting tables to 2^31 candidates),  rainbow tables can be deterministic, i.e. guarantee of full attack application: check against table means check of attack for 100% of candidates,  now it looks like precomputed attacks with small rainbow tables may get practical use, but more tests with parameters of rainbow tables are needed: does not fast look up mean a lot of space for table?  other approaches are interesting too. Ideas? Thanks!  Regards, Aleksey Cherepanov # * coding: utf8 * # Simple rainbow tables, experiment / toy example # We will precompute attack: "word1 word2 word3" where words are # picked from a list of 256 words. 2^24 candidates is a toy example. # That's too much for toy example. Let's use 2 words. # Copyright © 2015 Aleksey Cherepanov <lyosha@...nwall.com> # # Redistribution and use in source and binary forms, with or without # modification, are permitted. # Tunable parameters word_count = 2 words = ["a" + str(i) for i in range(256)] chain_length = 20 # Code starts here import hashlib import sys import itertools import time total_count = len(words) ** word_count def number_to_candidate(number): ws = [] l = len(words) for i in range(word_count): ws.append(words[number % l]) number /= l return " ".join(ws) def hash_to_number(h): k = 256 # h = h[5:] r = ord(h[0]) i = 1 while k < total_count: k <<= 8 r <<= 8 r += ord(h[i]) i += 1 r %= total_count return r def generate(): start = time.time() tried = [False for i in range(total_count)] chains = [] print >> sys.stderr, "Generation: total count = {0}, chain_length = {1}".format(total_count, chain_length) k = 0 # for i in range(total_count): i = 0 while i < total_count: if k & 0xFFF == 0: t = tried.count(True) l = len(chains) tt = 0 ef = 0 avg_len = 0 if l > 0: tt = float(t) / l ef = float(l) / i avg_len = float(sum(c[2] for c in chains)) / l print >> sys.stderr, "i == {0}; l(c) == {1}, tried {2} {3:.3f}, t/c {4:.3f}, avg {5:.3f}".format(k, l, t, float(t) / total_count, tt, avg_len) if not tried[i]: ni = i for n in range(chain_length): w = number_to_candidate(ni) tried[ni] = True h = hashlib.md5(w).digest() ni = hash_to_number(h) # If new candidate is already tried then we stop and # save short chain. if tried[ni]: # if ni in [tc[0] for tc in chains]: # print >> sys.stderr, "hi there" break # Save chain: # i  initial number, # h  final hash, # n  length of chain. chains.append((i, h, n + 1)) if len(chains) > total_count: print >> sys.stderr, "too many chains" break i += 1 k += 1 end = time.time() print >> sys.stderr, "chains", len(chains) print >> sys.stderr, "efficiency {0:.3f}".format(float(len(chains)) / total_count) print >> sys.stderr, [c[2] for c in chains[0:10]] print >> sys.stderr, "generated in ", end  start return chains def prepare(chains): # We repack chains to be easily for lookups. r = {} for c in chains: i, h, n = c r[h] = (i, n) return r def crack(h, chains_hash): orig_h = h for i in range(chain_length): if h in chains_hash and chains_hash[h][1] > i: # Possible chain found. ch = h cn = chains_hash[h][0] for t in range(chains_hash[h][1]): cw = number_to_candidate(cn) ch = hashlib.md5(cw).digest() if ch == orig_h: # We found the password! return cw cn = hash_to_number(ch) ni = hash_to_number(h) w = number_to_candidate(ni) h = hashlib.md5(w).digest() return None def check_good(chains_hash): k = 0 for ws in itertools.product(*([words] * word_count)): w = " ".join(ws) h = hashlib.md5(w).digest() r = crack(h, chains_hash) if r != w: print >> sys.stderr, "check failed on '{0}' with results '{1}', hash is {2}".format(w, r, h.encode('hex')) return else: k += 1 print >> sys.stderr, "all good, checked", k chains = generate() chains_hash = prepare(chains) check_good(chains_hash) # print >> sys.stderr, crack(hashlib.md5('asdf').digest(), chains_hash) test_count = 1000 test1 = [hashlib.md5(str(i)).digest() for i in range(test_count)] test2 = [hashlib.md5(number_to_candidate(i)).digest() for i in range(test_count)] test = test1 + test2 start = time.time() for h in test: # %% count good and bad crack(h, chains_hash) end = time.time() print >> sys.stderr, "end: count {0}, time {1} s, speed {2:.2f} c/s".format(len(test), end  start, float(len(test)) / (end  start)) start = time.time() for i in range(total_count): w = number_to_candidate(i) hashlib.md5(w).digest() # cmp_all() should be here end = time.time() print >> sys.stderr, "brute: time {0} s, speed {1:.2f} c/s".format(end  start, float(total_count) / (end  start)) # * coding: utf8 * # Simple rainbow tables, experiment / toy example # We will precompute attack: "word1 word2 word3" where words are # picked from a list of 256 words. 2^24 candidates is a toy example. # That's too much for toy example. Let's use 2 words. # Copyright © 2015 Aleksey Cherepanov <lyosha@...nwall.com> # # Redistribution and use in source and binary forms, with or without # modification, are permitted. # Tunable parameters word_count = 3 words = ["a" + str(i) for i in range(256)] chain_length = 200 color_count = 200 def hash_it(w): return hashlib.md5(w).digest() # Code starts here import hashlib import sys import itertools import time total_count = len(words) ** word_count def number_to_candidate(number): ws = [] l = len(words) for i in range(word_count): ws.append(words[number % l]) number /= l return " ".join(ws) def hash_to_number(h, position_in_chain): k = 256 r = ord(h[0]) i = 1 while k < total_count: k <<= 8 r <<= 8 r += ord(h[i]) i += 1 r += position_in_chain % color_count r %= total_count return r def generate(): start = time.time() tried = [False for i in range(total_count)] chains = [] print >> sys.stderr, "Generation: total count = {0}, chain_length = {1}, color_count = {2}".format(total_count, chain_length, color_count) k = 0 # for i in range(total_count): i = 0 while i < total_count: if k & 0xFFF == 0: t = tried.count(True) l = len(chains) tt = 0 ef = 0 avg_len = 0 if l > 0: tt = float(t) / l ef = float(l) / i avg_len = float(sum(c[2] for c in chains)) / l print >> sys.stderr, "i == {0}; l(c) == {1}, tried {2} {3:.3f}, t/c {4:.3f}, avg {5:.3f}".format(k, l, t, float(t) / total_count, tt, avg_len) if not tried[i]: ni = i for n in range(chain_length): w = number_to_candidate(ni) tried[ni] = True h = hash_it(w) ni = hash_to_number(h, n) # Save chain: # i  initial number, # h  final hash, # n  length of chain. chains.append((i, h, n + 1)) if len(chains) > total_count: print >> sys.stderr, "too many chains" break i += 1 k += 1 end = time.time() print >> sys.stderr, "chains", len(chains) print >> sys.stderr, "efficiency {0:.3f}".format(float(len(chains)) / total_count) # print >> sys.stderr, [c[2] for c in chains[0:10]] print >> sys.stderr, "generated in ", end  start return chains def prepare(chains): # We repack chains to be easy for lookups. r = {} for c in chains: i, h, n = c if h not in r: r[h] = [] r[h].append((i, n)) return r def crack(h, chains_hash): orig_h = h for offset in range(color_count): h = orig_h for i in range(offset, chain_length): if h in chains_hash: # Possible chains found. for chain in chains_hash[h]: ch = h cn = chain[0] for t in range(chain[1]): cw = number_to_candidate(cn) ch = hash_it(cw) if ch == orig_h: # We found the password! return cw cn = hash_to_number(ch, t) ni = hash_to_number(h, i) w = number_to_candidate(ni) h = hash_it(w) return None def check_good(chains_hash): k = 0 for ws in itertools.product(*([words] * word_count)): w = " ".join(ws) h = hash_it(w) r = crack(h, chains_hash) if r != w: print >> sys.stderr, "check failed on '{0}' with results '{1}', hash is {2}".format(w, r, h.encode('hex')) return else: k += 1 print >> sys.stderr, "all good, checked", k chains = generate() chains_hash = prepare(chains) # check_good(chains_hash) # print >> sys.stderr, crack(hash_it('asdf'), chains_hash) test_count = 100 test1 = [hash_it(str(i)) for i in range(test_count)] test2 = [hash_it(number_to_candidate(i)) for i in range(test_count)] test = test1 + test2 start = time.time() for h in test: # %% count good and bad crack(h, chains_hash) end = time.time() print >> sys.stderr, "end: count {0}, time {1} s, speed {2:.2f} c/s".format(len(test), end  start, float(len(test)) / (end  start)) start = time.time() for i in range(total_count): w = number_to_candidate(i) hash_it(w) # cmp_all() should be here end = time.time() print >> sys.stderr, "brute: time {0} s, speed {1:.2f} c/s".format(end  start, float(total_count) / (end  start))
Powered by blists  more mailing lists