john-dev - Re: precomputed attacks for john: rainbow tables and other ways

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87v939yh71.fsf@d>
Date: Thu, 09 Sep 2021 17:08:50 +0300
From: Aleksey Cherepanov <aleksey.4erepanov@...il.com>
To: john-dev <john-dev@...ts.openwall.com>
Subject: Re: precomputed attacks for john: rainbow tables and other ways

Aleksey Cherepanov <aleksey.4erepanov@...il.com> writes:
> I just tried my previous idea about splitting wordlist into parts based
> on prefix of hash. But I stored only indexes of candidates and
> compressed everything.
>
> So to attack a hash, I would pick respective file and try candidates
> from it only. Also I would know that other files would not help. So for
> a particular hash I could say that the attack is exhausted reliably.
> Unlike regular RT, one might attack all hashes with given prefix in
> single run.

> I like that I hashed everything only once. But compression takes a lot
> of time. I am not sure if it scales this way, but I would define such
> possible baseline (with and without compression):
> - ~1024x speed-up, ~3x work to prepare, storing ~2 bytes per candidate,
> - ~1024x speed-up, ~1x work to prepare, storing 4 bytes per candidate.
> All with 100% success.
>
> I think speed-up is very flexible in such method, but for small
> keyspaces compression would degrade. OTOH for larger keyspaces, 4 bytes
> per candidate without compression is not possible. But I guess very fast
> and very specific compression could be used.

Yes, it is possible to pack everything well without many computations.
It is possible to write most significant byte once for a subgroup
getting 2.04 bytes per candidate index. Also it is possible to store
increments between indexes of candidates and pack 2 of them into 3
bytes, getting 1.57 bytes per candidate. 98% of increments are below
4096. 86% are below 2048, so it is possible to pack it better messing
with bits. But that's for 10-bit prefix / 1024 groups.

Choosing bigger prefix would cause bigger gaps between candidates, so
such dense packing would not be possible.

I got some ideas about usability. Let's say we have 2^40 keyspace and
use 4-byte prefix to split candidates into groups. So there are 2^32
groups, on average each group has 256 candidates. We would store 5 bytes
per index of candidate and consume 5TB on hdd (plus several GBs onto
index). Now let's say we get 100k hashes in a contest with all different
prefixes (worst case). Without any computations, we know that (2^32
minus 100k) groups do not contain passwords for these hashes. We have
approximately 100k * 256 candidates to try. That's a few seconds with
single thread on cpu, assuming we have fast hdd and fast function to get
candidate from index.

That's cool but let's calculate hashing speeds: let's choose descrypt as
relatively slow format, I have 3.5M c/s in john on cpu with single
thread. So 100k hashes (with same salt) would take 8 seconds to be
checked against the precomputed attack (assuming fast hdd). 2^40 would
take 87 hours to be prepared with one thread (maybe plus some additional
time to repack everything because 2^32 files seem a bad idea). The
speed-up seems great. But there are GPUs. Some GPUs give 1.4G c/s for
descrypt. That's 785 seconds or 13 minutes to perform just full attack
live.

So with same speed, bigger keyspace and 100% success rate, a precomputed
attack might be competitive to single gpu. But this approach is limited
by space consumption. I am curious if there are other approaches.

For the contests, it looks like small keyspaces are useful only early,
then it is better to run attacks specific to the contest. Early use of
prepared attack means that its approach should have great cracking
speed.

Other use in contests might be pattern discovery to help regular
cracking. It may tolerate low speed of cracking to try bigger keyspace.
Also it may tolerate low success rates, because we would try a small
subset of hashes anyway. For pattern discovery, regular rainbow tables
seem a good fit.

Interesting, LHT is mentioned. LHT stands for "Lossy Hash Table". Also
there is a link to Matt's "Probabilistic Password Cracker". :-)
https://www.tobtu.com/cracker.php

Thanks!

--
Regards,
Aleksey Cherepanov
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.