john-users - Re: Loading a large password hash file

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160709214614.GA31350@openwall.com>
Date: Sun, 10 Jul 2016 00:46:14 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Loading a large password hash file

On Thu, Jul 07, 2016 at 12:38:05AM -0400, Matt Weir wrote:
> More of a general question, but what should the default behavior of JtR be
> when you give it an unreasonably large password hash file to crack?

It doesn't know what's reasonable and what's not on a given system and
for a given use case, so it just keeps trying.  Do you feel the default
should be different?

> For example, let's say you give it 270 million Sha1 hashes?

This isn't necessarily unreasonable.  It should load those if memory
permits.  I guess this is related to:

http://reusablesec.blogspot.com/2016/07/cracking-myspace-list-first-impressions.html

In that blog post, you write that after "sort -u" you had an 8 GB file,
which means about 200 million unique SHA-1 hashes.  So I just generated
a fake password hash file using:

perl -e 'use Digest::SHA1 qw(sha1_hex); for ($i = 0; $i < 200000000; $i++) { print sha1_hex($i), "\n"; }'

which is 8200000000 bytes.  On a machine with enough RAM, JtR loaded it
in 6 minutes, and the running "john" process uses 13 GB.

I guess the loading time could be reduced by commenting out "#define
REVERSE_STEPS" in rawSHA1_fmt_plug.c and rebuilding, but I haven't tried
that.  Maybe we should optimize a few things in that format to speedup
the loading.

> Currently if I
> leave it running for a day or two it just hangs trying to process the file.

That's unreasonable.

> This was with bleeding-jumbo.
> 
> Aka I realize the hash file was way too big. Heck the file was large enough
> I couldn't fit the whole thing in RAM on the machine I was using.

Clearly, you need more RAM, or you could probably load half that file at
a time.

There's also the --save-memory option, which may actually speed things
up when you don't have enough RAM.  But that's sub-optimal, and high
memory saving levels may hurt cracking speed a lot.  They also hurt
loading time when there would have been enough RAM to load the hashes
without memory saving.  I've just tried --save-memory=2 on the 200M
SHA-1's file, and it looks like it'll load in about 1 hour (instead of
6 minutes), consuming something like 11 GB.  So probably not worth it in
this case.

> I'm more curious about how JtR should respond to that situation.

I think the current behavior is fine.  There are many OS-specific ways
in which the memory available to a process could be limited, and indeed
the RAM vs. swap distinction is also system-specific.  It'd add quite
some complexity to try and fetch and analyze that info, and to try and
guess (possibly wrongly) what the user's preference would be.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.