john-dev - Re: Judy array

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150915034038.GA319@openwall.com>
Date: Tue, 15 Sep 2015 06:40:39 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Judy array

Fred, magnum -

On Sun, Sep 13, 2015 at 08:28:08PM -0700, Fred Wang wrote:
> I use a 10-year-old Dell 2950 as my test environment, precisely because it uses slower memory, and more easily shows improvements.  For my "standard" test case (MD5, 29 million hashes, a ~13 million entry dictionary, and best64 rules, yielding about 1 billion hash attempts to find about 1.7 million solutions)
> 
> hashcat	3 minute 54 seconds
> mdxfind	1 minute 15 seconds  (Judy only)
> mdxfind	47 seconds  (Current code, Bloom filter + Judy)

With the attached patch, and running this command line:

time ./john -form=raw-md5 -w=10m.pass -ru=best64 -nolog -mem=999999999 -v=1 -fork=8 29m.txt

I am getting:

real    1m17.021s
user    6m46.399s
sys     0m20.028s

on 2x E5420 with 16 GB RAM.  The 8 processes have about 2 GB allocated
each, which initially means a little over 2 GB of real RAM for all of
them, but as passwords get cracked and pages get copied, the total
memory usage grows to slightly over 8 GB, unfortunately.  The patch
reduces the copy-on-write occurrences; without it, memory usage would be
higher yet.  Of course, this is still not great.

The patch shows my changes to john.conf (these are not to be committed).

These were most important:

-Save = 60
+Save = 600
-ReloadAtCrack = Y
+ReloadAtCrack = N
-ReloadAtDone = Y
+ReloadAtDone = N
-ReloadAtSave = Y
+ReloadAtSave = N

Effectively disabling the pot sync feature as above saves several minutes.

This saves 4 seconds, but only when -mem=999999999 is also used:

-WordlistMemoryMap = Y
+WordlistMemoryMap = N

This saves 1 second:

-NoLoaderDupeCheck = N
+NoLoaderDupeCheck = Y

This makes almost no difference (the system is otherwise idle):

-Idle = Y
+Idle = N

I think there might still be wordlist duplicate suppression going on.
It would be nice to try disabling it.

In cracker.c, only the copy-on-write reducing changes actually help in
this benchmark.  I am not entirely confident that they don't break
anything in any other cracking mode, etc. - I'd appreciate some testing
of them in jumbo before I possibly get them into the core tree.

Prefetching doesn't help in this benchmark.  In fact, without it I was
getting 1 second lower running time:

real    1m16.055s
user    6m45.974s
sys     0m20.433s

That's two separate hunks in the patch (one with "#include
<emmintrin.h>" and the other with actual code), so perhaps they should
be excluded for now.

The change to PASSWORD_HASH_SIZE_FOR_LDR speeds up startup by a few
seconds.  The table size used is 16M elements, so 128 MB on 64-bit or
64 MB on 32-bit systems.  It think that's acceptable these days,
especially given that for tiny files (which is what people might process
on tiny systems) that's just address space rather than memory
allocation.  This memory is freed after loading is complete.

The change to POT_BUFFER_SIZE saves up to 1 second in this benchmark,
but it was helping a lot more before I disabled pot sync (it was still
way too slow).

The change to LOG_BUFFER_SIZE doesn't matter for this benchmark since I
used the -nolog option.

The addition of source() method for raw-md5 helps a lot.  Without it,
and without the copy-on-write avoidance in cracker.c, I couldn't run
8 processes on this machine without it getting into swap.  Perhaps we
should add source() to more formats.

Alexander

View attachment "john-huge-opt1.diff" of type "text/plain" (8318 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.