john-dev - precomputed attacks for john: rainbow tables and other ways

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150625165947.GB13240@openwall.com>
Date: Thu, 25 Jun 2015 19:59:47 +0300
From: Aleksey Cherepanov <lyosha@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: precomputed attacks for john: rainbow tables and other ways

Precomputed attacks

An interesting discussion about cracking in general occurred. Results
may become a feature of john someday.


In john now, we try candidates against hashes. If we get new hashes,
we need to try same attacks again trying the same candidates again.
For salted hashes, it is rather ok due to salts (it should be rare
case to meet another hash with the same salt; except for descrypt that
has only 4096 salts). But for hashes without salt, we totally repeat
ourself. What if we could hash candidate and store the results for
later use? We might precompute an attack even not having hashes.

Precomputed attack may be applicable to hashes without salts and with
weak salts (like descrypt).


Straight forward approach: just remember all pairs candidate-hash. We
will quickly run out of memory/space because we tries a lot of
candidates. Also look up may be slower than usual cracking if we got
several hashes. Nevertheless such approach may have its application.
Special kind of that: to store only successful cracks, john does it.


Rainbow tables

Well known example with time-space trade off is rainbow tables: we
store less but look up needs more operations. For me, rainbow tables
are associated with the following properties:
- tables are big
- look up is very slow
- attack is charset based
- rainbow tables are probabilistic: they do not guarantee that hash is
  not from password of this charset
But rainbow tables can be very different! A rainbow with all opposite
properties can be made! And I made some.

Rainbow tables have various parameters that can be tuned to move
balance between look up time and space. Implementations for wordlist
based attacks exist. For small attacks, it is possible to guarantee
coverage of key space.

I made a simple rainbow tables engine to play with parameters. It was
broken but Matt Weir gave crucial hint to me and now it works. The
implementation is attached (t.py).

The attack behind the engine is word combinations. Any attack that has
quick function number->candidate can be implemented easily and be
efficient.

The engine tracks tried candidates and guarantees that check of hash
against the table means that hash has password outside of the attack's
key space. It is possible to track coverage with 1 bit per candidate
so 2^32 candidates need only 512mb of memory (it is only during
generation). But tracking limits tables to attacks of such sizes.
Bigger attacks has to be split into separate tables, so it is not
suitable for huge attacks like 8 ascii characters. But it is possible
to make 4 word pass phrases with 256 most popular words. A pentester
may want to prepare an attack with words from local dictionary, or
topic based. So such precomputed attack may be interesting in
practical use.


Output from my engine

Generation: total count = 16777216, chain_length = 200, color_count = 200
i == 0; l(c) == 0, tried 0 0.000, t/c 0.000, avg 0.000
i == 4096; l(c) == 4003, tried 772016 0.046, t/c 192.859, avg 200.000
[...]
i == 102400; l(c) == 69699, tried 8416460 0.502, t/c 120.754, avg 200.000
[...]
i == 8388608; l(c) == 725665, tried 16481885 0.982, t/c 22.713, avg 200.000
[...]
i == 10854400; l(c) == 805041, tried 16601228 0.990, t/c 20.622, avg 200.000
[...]
i == 15695872; l(c) == 931886, tried 16752150 0.999, t/c 17.977, avg 200.000
[...]
i == 16773120; l(c) == 956305, tried 16777101 1.000, t/c 17.544, avg 200.000
chains 956420
efficiency 0.057
generated in  1197.97685814
end: count 200, time 185.126296997 s, speed 1.08 c/s
brute: time 43.9373371601 s, speed 381844.17 c/s

As a test, I generated a table with 3 word pass phrases from set of
256 words, i.e. 16M candidates. Check of 1 hash needs less than a
second. Space of the table on disk could be 6 mb (or 8 mb with 4 bytes
to store 1 number; we need to store 2 numbers per chain). It is
interesting to tune parameters.

i == 8388608; l(c) == 725665, tried 16481885 0.982, t/c 22.713, avg 200.000

^ That's the point when we tried sequentially first 50% of candidates,
but chains cover 98% of chains. Remaining 2% need so many chains that
we can write down numbers of candidates as is to perform regular crack
then (storing candidate means 1 number, while chain needs 2 numbers,
so we save space and do not increase cracking time much). That's a
trade off available here.

Such engine can be implemented in john not changing current format
interface. A limit of 2^31 candidates per table occurs: the problem is
that we need part of hash, the only method for that is get_hash[](),
get_hash[6]() gives 31 bits, it is maximum. So having 2M c/s for
raw-sha512, 2^31 attack would mean 18 minutes to perform attack as
usual.


Broken implementation

My initial broken implementation is attached too (t_old.py).

My original implementation missed the idea of "colors". So it was not
real _rainbow_ tables. The difference: rainbow tables uses chains of
candidates/hashes, both hash and position in chain are used to produce
next candidate, while I used only hash.

Using only hash to produce next candidate: any collision means that
end of chains are the same, I cut such chains and I get approximately
N/2 chains in the end (where N is the size of attack's key space).
Most chains were very short, but they are almost unavoidable.

Alexander Cherepanov pointed out to me that all candidates consist a
graph with 1 arrow from node in case of single color tables.
Connections are random and depend onto keyspace, hashing function, and
function to map hash into next candidate. Number of chains can't be
lower than number of leafs (nodes with 0 incoming connections).

Idea 1: I can tweak the function to map hash into next candidate:
compute hashes from all candidates in row and remember them, then try
various mapping functions to choose the best. Mapping function can be
flexible (as Pearson hashing) and we can try hill climbing (or other
meta heuristic algorithm) to improve it gradually. Due to randomness
of connections, it should not be possible to improve connections much
not storing a lot of data. But it may be interesting to research these
limits.

Idea 2: if we computed all hashes and remember them, then we can just
construct a minimal perfect hash function to map hashes back into
candidates. We need order preserving mphf. While mphf is not a new
topic, our certain task may have better solution than existing.

Time-memory trade off: we don't really need to have 1 function that
maps hash into candidate. We can map 10% of hashes into their
candidates using 1 function, another 10% with other function and so
on, thus we need to check 10 functions (to perform 10 hashing: given
hash -> candidate -> computed hash -> comparison with given hash). We
can reduce total space used because building a function picks more
comfortable candidates that need less space. An experiment is needed.


Conclusions

- rainbow tables can be suitable for rather fast checks like 1s per hash,

- rainbow tables can work with various attacks including attacks onto
pass phrases,

- rainbow tables can be implemented in john not changing current
format interface (limiting tables to 2^31 candidates),

- rainbow tables can be deterministic, i.e. guarantee of full attack
application: check against table means check of attack for 100% of
candidates,

- now it looks like precomputed attacks with small rainbow tables may
get practical use, but more tests with parameters of rainbow tables
are needed: does not fast look up mean a lot of space for table?

- other approaches are interesting too.


Ideas?


Thanks!

-- 
Regards,
Aleksey Cherepanov

View attachment "t_old.py" of type "text/x-python" (4967 bytes)

View attachment "t.py" of type "text/x-python" (4960 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.