john-dev - multi-threaded hash table initialization

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date: Thu, 17 Sep 2015 01:18:41 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: multi-threaded hash table initialization

magnum -

Attached is a patch splitting initialization of bitmap and hash table
into 3 threads (with some duplicate work to determine the hash values).
On the 29M testcase, this appears to save about 1 second on average
across multiple invocations.  Here's a good run, down from 47 seconds
before this change:

real    0m45.863s
user    3m4.159s
sys     0m19.746s

I think we can commit it and give it some more testing.  There might be
regressions for other cases, especially multi-salt with small per-salt
hash tables - maybe we need to add a check for that (would need to come
up with a hash table size threshold).  There might also be bugs.  An
older revision of this code segfaulted on me once (just once), and I've
since fixed a bug, but I don't see how that bug would have caused a
segfault.  So there might still be a bug in there.

OTOH, maybe the time savings will be greater on even bigger hash lists.

I think it's an interesting enough approach that I prefer to share with
the team and give it a try.

Comparable time savings are (still) possible by avoiding the memset()'s,
and using mmap() to allocate large enough bitmaps and hash tables.  This
could also save RAM, and it could speed up cracking if we use "huge
pages" for those allocations - but the latter would only be a good idea
without --fork or once we introduce shared memory for these things (or
the RAM loss from copy-on-write would be even worse than it is now).

I have suitable memory allocation code in yescrypt, so maybe we'll reuse
that once we integrate final yescrypt.

Alexander

View attachment "john-huge-loader-mt.diff" of type "text/plain" (4842 bytes)

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.