john-users - approaches to use old password as baseword for new hash matching by username/login (as in CMIYC 2019)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v9t17pfq.fsf@gmail.com>
Date: Sun, 06 Oct 2019 20:29:29 +0300
From: Aleksey Cherepanov <lyosha@...nwall.com>
To: john-users@...ts.openwall.com
Subject: approaches to use old password as baseword for new hash matching by username/login (as in CMIYC 2019)

Beware of spoilers.

In "Crack Me If You Can 2019" online hash cracking competition held at
DEFCON, KoreLogic made a distilled task to reflect real life scenario:
attacker got old and new hashes with usernames/logins for each user. Old
hashes use weak scheme, new hashes are slow and salted. But the weakness
is in small change between passwords (if any at all). In this scenario,
old passwords are very good basewords to crack new passwords. But the
hashes are too slow to try all old passwords against all new hashes, so
usernames should be used to match old and new hashes to transit
findings.

Working on this task, we (john-users team) got interesting findings not
specific to the contest.

TL;DR:
----------------------------------------------------------------------
In some cases, single mode (aka --single=<rules>) is a good choice to
target attack on individual hashes one by one in a batch. I hope new
options will be introduced to make it the best choice in most cases.

The following tweaks are needed for CMIYC's scenario:
- pack file with hashes as 'baseword:hash' or ':hash:::baseword',
  - other separator may be needed (e.g. --field-separator-char='\x01'),
- use 'SingleRetestGuessed = N' or --single-retest-guess=N,
- use 'SingleWordsPairMax = 0' and then 'JumboSingleWords = N',
- for ':hash:::baseword' variant, add 'PristineGecos = Y' and then
  'JumboSingleWords = N',
- use --single=none to try basewords without default mutations.

Simple scripting is not very bad performance wise, but has terrible UX.
----------------------------------------------------------------------

More info about the contest and its hashes may be found there:
https://contest-2019.korelogic.com/stats-hashsets.html

Described findings are not documentation. It is a representation of
current state. Described behaviour may be changed later.

______________________________________________________________________
Findings shortly:

- Sometimes new password is a small variation of old password, so it may
  be trivial to find new password having old password.

- So let's say we cracked the fast hashes and have thousands of old
  passwords together with usernames. We need to try each old password
  against respective new hash with the same username (approximately
  1:1).

- We could try all old passwords against all new hashes. It may have
  sense when hashes are from one source.

  - But it is not the case in CMIYC. It would be too slow to try all old
    passwords as basewords for all new hashes.


- We could extract new hash and respective old password to run separate
  attack. It should be easy with some scripting, right? Let's do a loop
  over username:old_password pairs, put old_password into file and pick
  target hashes by username.

  - File with username:old_password pairs was produced by other script
    as a part of centralized processing of team's results.

  - Good news: john's --users=... option allows to pick hash
    automatically by username or usernames.

    - In CMIYC, there were some users with suffixes "-a", "-b", "-c", so
      --users="$u,$u-a,$u-b,$u-c" was used to pick all variants.

  - Extreme case of 1 candidate for 1 hash in attack has noticeable
    overhead (e.g. when old password is the perfect match for new hash
    or there is 100% reliable rule to get new password from old). But it
    is not a problem when one needs to mutate old password in different
    way and try all variants.

  - The script got several improvements over time, so the task is a bit
    harder than it seems.

    - Also naive implementation would be suboptimal for recurring salts.
      In CMIYC, all salts were different.

  - It is hard to run it in parallel.

    - Terrible usability!

    - After all, a dispatcher was written to run as many loops as CPU
      threads available and to pick parts of work from a shared pool. As
      a bonus, it could run loops on remote machines (but pool could not
      be shared with other team members). It was terrible too, but it
      was much much better than one loop. The dispatcher showed how much
      we missed in CMIYC using the loop as the only approach and
      delaying implementation of dispatcher.


- Alternatively we could pack old passwords and new hashes together as
  "old_password:new_hash" into one file and use single mode (--single),
  so john would try old password against only 1 respective new hash.

  - --single option may take parameter with name of rules section or
    with inline rules, so --single=None or --single=':' would try old
    password without modifications.

  - But there is a caveat: by default, each successful crack would be
    tried against all other new hashes in the file.
    --single-retest-guess=N on command line or 'SingleRetestGuessed = N'
    in config can disable it.

  - But some of old passwords contain ':', that is default field
    separator in files with hashes for john.

    - It is possible to inspect candidates in single mode with debugging
      format 'plaintext' (tag "$0$") and --verbosity=6: try to attack
      'a,y:$0$p:b:c:d,x:e:f:g:h:i:j:k' with --single=None --verbosity=6.

    - It is possible to change the field separator with option
      --field-separator-char=C.

      - But the same field separator would be used .pot file. So it
        may be tricky to process results.

      - It is possible to specify the char in hex as \xNM, e.g.
        --field-separator-char='\x01'.

        - None of passwords in CMIYC contained '\x01'. So it was
          possible to use only '\x01' and ':', it would make scripts for
          processing quite easy.

    - It is possible to put old password into GECOS field (e.g. like
      ':new_hash:::old_password'), so the hash would be loaded but the
      baseword with ':' would be cut.

      - Leaving username in place (e.g. like 'u:h:::gecos') would
        produce more candidates.

      - Also 'PristineGecos = Y' and then 'JumboSingleWords = N' are
        required to reduce number of candidates. See other point.

    - 'PristineGecos = Y' does not solve the problem with ':'. It has
      other meaning. See other point.

    - There were not many passwords in CMIYC with ':', so it was
      possible to ignore them for single mode and cover by scripting
      approach.

  - Any separator chars (roughly anything that is not in set A-Za-z0-9)
    in username or in GECOS field gives additional candidates because
    john extracts words.

    - "a,b" in login field produces "a,b", "a,ba", "aa", "a,bb", "ab", "a",
      "aa,b", "b", "ba,b", "ba".

    - 'JumboSingleWords = Y' option in config would increase splitting
      (e.g. like JEdgarHoover to J Edgar Hoover).

    - "a,b" in GECOS field produces "a", "ab", "b", "ba".

      - 'PristineGecos = Y' in config would keep variants with
        separators as within login field.

        - 'PristineGecos = Y' turns on JumboSingleWords. So additional
          'JumboSingleWords = N' is needed then.

    - 'SingleWordsPairMax = 0' reduces the number of additional
      candidates, but it does not disable splitting fully. Log shows
      that the option is increased automatically to cover more
      candidates than format's max. keys per crypt (mkpc).

      - 'SingleWordsPairMax = 0' turns on JumboSingleWords. So
        additional 'JumboSingleWords = N' is needed then.

      - mkpc is 8 for django-scrypt and plaintext formats. --mkpc=1
        option does not affect the increase of SingleWordsPairMax for
        these formats.

    - I don't know any way to disable extraction of words without
      patching. Is there any?

    - Many old passwords in CMIYC had special chars, so single mode is
      slower than perfect.

      - In CMIYC, some sets of old passwords did not contain separator
        chars at all, so there would not be the performance penalty from
        single mode working on them.

      - Below there are rough tests with CMIYC's "putty" set (almost all
        passwords have '?', like "23?d?d1036?d") and bogus words without
        separators, with one candidate per hash (from --rules=':Az"?a"')
        and multiple candidates from bogus rules:

        - 1 candidate per hash, with separators, most are cracks: single
          mode with all tweaks is ~3x faster than the script,

        - 1 candidate per hash, with separators, no cracks: single
          mode with all tweaks is ~2x slower than the script,

        - multiple candidates per hash, with separators: single mode
          with all tweaks is ~11x slower than the script,

        - multiple candidates per hash, without separators: single mode
          is close to the perfect time, the script is slightly slower.

  - Single mode with 1 candidate per hash does not perform well with
    --fork= option: only one thread does all the work, others exit
    immediately.

- Alternatively username matcher could be implemented right in john, so
  it would be possible to combine files of format 'username:new_hash'
  and of format 'username:old_password'. Maybe we'll see this feature in
  john. :-)

______________________________________________________________________
Some examples:

Example of our script (reformatted):
----------------------------------------------------------------------
while IFS=: read -r u p; do
    printf '%s\n' "$p" > twl &&
    grep -- "$u" results/uncracked/0.django-scrypt.slow-salted.target.pw &&
    echo "$u" &&
    ./JohnTheRipper/run/john \
        --users="$u,$u-a,$u-b,$u-c" \
        results/uncracked/0.*.target.pw \
        --wordlist=twl \
        --rules=': sq1 sw2 se3 sr4 st5 sy6 su7 si8 so9 sp0';
done < results/pair_user_crack/14.raw-md5.fast-nosalt.log2.txt
----------------------------------------------------------------------

So we read user:password pairs of cracked hint hashes, save password
into temporary wordlist, pick users from the whole target file with
--users="$u,$u-a,$u-b,$u-c". I added grep for speed. This version was
written after the contest. One more improvement is possible: redirect
grep into temporary file to be used by john instead of the full file.

The script was supposed to be adapted for every pack of hints
separately. It was a flexible approach. But it was hard to manage.

The script was hard to run in parallel. But it had bearable performance
for packs of hint hashes with one 100% good rule as the following packs:
- loga3:  --rules=':R'
- putty:  --mask="?w??a" (or --rules=':Az"?a"')
- Alaska: md5($p)
- Log2:   --rules=': sq1 sw2 se3 sr4 st5 sy6 su7 si8 so9 sp0'

But JBJ pack had a set of different rules to convert passwords from hint
to passwords for target. It was the point when it should be obvious that
performance mattered.

Introspecting into generation of additional candidates in single mode:
----------------------------------------------------------------------
$ echo 'a:$0$p:b:c:d:e:f:g:h:i:j:k' > plaintext.pw
$ ./JohnTheRipper/run/john plaintext.pw --verbosity=6 --single=:
[...]
set_key(a, 0)
set_key(ad, 1)
set_key(d, 2)
set_key(da, 3)
[...]

$ echo 'abc123:$0$p' > plaintext.pw
$ ./JohnTheRipper/run/john plaintext.pw --verbosity=6 --single=:
[...]
set_key(abc123, 0)
[...]

$ echo 'abc,123:$0$p' > plaintext.pw
$ ./JohnTheRipper/run/john plaintext.pw --verbosity=6 --single=:
[...]
set_key(abc,123, 0)
set_key(abc,123abc, 1)
set_key(aabc, 2)
set_key(abc,123123, 3)
set_key(a123, 4)
set_key(abc, 5)
set_key(abcabc,123, 6)
set_key(aabc,123, 7)
Almost done: Processing the remaining buffered candidate passwords, if any.
set_key(abc123, 0)
set_key(123, 1)
[...]
----------------------------------------------------------------------

Effect of 'SingleWordsPairMax = 0':
----------------------------------------------------------------------
$ echo 'a,b,c,d:$0$p' > plaintext.pw
$ ./JohnTheRipper/run/john plaintext.pw --verbosity=6 --single=: 2>&1 | grep -c 'set_key('
26

$ echo 'a,b,c,d:$0$p' > plaintext.pw
$ printf '.include <john.conf>\n[Options]\nSingleWordsPairMax = 0\n' > t.conf
$ cat t.conf
.include <john.conf>
[Options]
SingleWordsPairMax = 0

$ ./JohnTheRipper/run/john plaintext.pw --verbosity=6 --single=: --config=t.conf --log-stderr
[...]
0:00:00:00 - SingleWordsPairMax increased to 3 for high KPC (8)
0:00:00:00 - SingleWordsPairMax used is 3
[...]

$ ./JohnTheRipper/run/john plaintext.pw --verbosity=6 --single=: --config=t.conf 2>&1 | grep -c 'set_key('
15
----------------------------------------------------------------------

Let's measure speeds. single_9_nocracked.pw was made from "putty" set
and target hashes (with 180 hashes replaced by new variants; old
variants were not included). The combination is not accurate, so I'll
omit it. Script is modified to work with this one file.
----------------------------------------------------------------------
$ wc -l single_9_nocracked.pw
1751 single_9_nocracked.pw

$ head -n 1 single_9_nocracked.pw
?d?d3?d?d3966:scrypt$J2yu/dctwknD$15$8$1$64$Tbtp3zmykGFe/rrg70yQbffw/ssrM1wEsS53zI40QN6n3evK4tENbechTGTdWRhb5/S/lOH+YwTcF+XYfAhrDQ==

$ rm s9.pot ; time ./JohnTheRipper/run/john --no-log single_9_nocracked.pw --single=':Az"?a"' --pot=s9.pot
[...]
(It stalls quickly due to retests)

$ rm s9.pot ; time ./JohnTheRipper/run/john --no-log single_9_nocracked.pw --single=':Az"?a"' --pot=s9.pot --single-retest-guess=N
[...]
1571g 0:00:04:06 DONE (2019-10-06 16:13) 6.370g/s 19.53p/s 20.26c/s 20.26C/s dd39?a

$ printf '.include <john.conf>\n[Options]\nSingleWordsPairMax = 0\nJumboSingleWords = N\n' > swpm0.conf
$ cat swpm0.conf
.include <john.conf>
[Options]
SingleWordsPairMax = 0
JumboSingleWords = N

$ rm s9.pot ; time ./JohnTheRipper/run/john --no-log single_9_nocracked.pw --single=':Az"?a"' --pot=s9.pot --single-retest-guess=N --config=swpm0.conf
[...]
1571g 0:00:03:09 DONE (2019-10-06 16:17) 8.279g/s 19.33p/s 20.27c/s 20.27C/s 10?a

$ rm s9.pot ; time sh -c 'while IFS=: read p h; do echo "$p" > twl && echo "$h" > t.pw && ./JohnTheRipper/run/john --no-log t.pw --wordlist=twl --rules=":Az~?a~" --pot=s9.pot --format=django-scrypt; done < single_9_nocracked.pw 2>/dev/null 1>/dev/null'; wc -l s9.pot

real	9m56.385s
user	4m46.458s
sys	2m13.355s
1571 s9.pot
----------------------------------------------------------------------

Rough estimate would be 88s to try 1 candidate per hash for 1751 hashes
with unique salts at 20 c/s. All results together:
- rough estimation:   1m28s
- single, no retest:  4m06s
- + swpm0.conf:       2m58s (the crack is the first in bunch)
- with :Az"no" rule: 16m08s (all additional candidates are tried)
- modified script:    9m56s

Let's pick 20 hashes and try 100x more rules (':Az~[0-9][0-9]~'):
- rough estimation:   1m40s
- single, no retest: 28m12s
- + swpm0.conf:      18m29s
- modified script:    1m44s

Let's replace basewords with words without delimiters:
----------------------------------------------------------------------
$ head -n 20 single_9_nocracked.pw | perl -C0 -pe 's/^[^:]*:/abc$.qwe:/' > t.pw
----------------------------------------------------------------------

So 20 hashes with 100 rules, basewords don't contain separators:
- rough estimation:   1m40s
- single, no retest:  1m40s
- + swpm0.conf:       1m40s
- modified script:    1m44s

Stats about separators against our cracks including cracks found after
the contest (# with separators, total #, file name):
----------------------------------------------------------------------
$ for f in results/cracked/[1-9]* ; do \
    printf '%5d %5d  %s\n' \
           "$(grep -c '[^A-Za-z0-9]' "$f")" \
           "$(wc -l < "$f")" \
           "$(basename $f)"; \
done
(w/sep total  file name)
  612  1508  10.raw-sha1.fast-nosalt.alaska.txt
    4    21  11.nt.fast-nosalt.alaska.txt
  147   957  12.nt.fast-nosalt.bluemangroup.txt
  200   932  13.raw-sha1.fast-nosalt.coredump.txt
    0  1001  14.raw-md5.fast-nosalt.log2.txt
  168   758  1.raw-md5.fast-nosalt.seeessseeessteedee.txt
 1418  3770  2.nt.fast-nosalt.buckbuckbuck.txt
    0  1001  3.raw-md5.fast-nosalt.log1.txt
    0  1317  4.raw-md5.fast-nosalt.speak.txt
  743  1801  5.raw-sha1.fast-nosalt.s8sux.txt
 3655  9971  6.nt.fast-nosalt.jbj.txt
    0  1001  7.raw-md5.fast-nosalt.loga3.txt
 8186  9868  8.raw-md5.fast-nosalt.wolf.txt
 2305  2308  9.mysql-sha1.fast-nosalt.putty.txt
----------------------------------------------------------------------

Thanks!

--
Regards,
Aleksey Cherepanov
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.