Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 15 Sep 2015 06:40:39 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Judy array

Fred, magnum -

On Sun, Sep 13, 2015 at 08:28:08PM -0700, Fred Wang wrote:
> I use a 10-year-old Dell 2950 as my test environment, precisely because it uses slower memory, and more easily shows improvements.  For my "standard" test case (MD5, 29 million hashes, a ~13 million entry dictionary, and best64 rules, yielding about 1 billion hash attempts to find about 1.7 million solutions)
> 
> hashcat	3 minute 54 seconds
> mdxfind	1 minute 15 seconds  (Judy only)
> mdxfind	47 seconds  (Current code, Bloom filter + Judy)

With the attached patch, and running this command line:

time ./john -form=raw-md5 -w=10m.pass -ru=best64 -nolog -mem=999999999 -v=1 -fork=8 29m.txt

I am getting:

real    1m17.021s
user    6m46.399s
sys     0m20.028s

on 2x E5420 with 16 GB RAM.  The 8 processes have about 2 GB allocated
each, which initially means a little over 2 GB of real RAM for all of
them, but as passwords get cracked and pages get copied, the total
memory usage grows to slightly over 8 GB, unfortunately.  The patch
reduces the copy-on-write occurrences; without it, memory usage would be
higher yet.  Of course, this is still not great.

The patch shows my changes to john.conf (these are not to be committed).

These were most important:

-Save = 60
+Save = 600
-ReloadAtCrack = Y
+ReloadAtCrack = N
-ReloadAtDone = Y
+ReloadAtDone = N
-ReloadAtSave = Y
+ReloadAtSave = N

Effectively disabling the pot sync feature as above saves several minutes.

This saves 4 seconds, but only when -mem=999999999 is also used:

-WordlistMemoryMap = Y
+WordlistMemoryMap = N

This saves 1 second:

-NoLoaderDupeCheck = N
+NoLoaderDupeCheck = Y

This makes almost no difference (the system is otherwise idle):

-Idle = Y
+Idle = N

I think there might still be wordlist duplicate suppression going on.
It would be nice to try disabling it.

In cracker.c, only the copy-on-write reducing changes actually help in
this benchmark.  I am not entirely confident that they don't break
anything in any other cracking mode, etc. - I'd appreciate some testing
of them in jumbo before I possibly get them into the core tree.

Prefetching doesn't help in this benchmark.  In fact, without it I was
getting 1 second lower running time:

real    1m16.055s
user    6m45.974s
sys     0m20.433s

That's two separate hunks in the patch (one with "#include
<emmintrin.h>" and the other with actual code), so perhaps they should
be excluded for now.

The change to PASSWORD_HASH_SIZE_FOR_LDR speeds up startup by a few
seconds.  The table size used is 16M elements, so 128 MB on 64-bit or
64 MB on 32-bit systems.  It think that's acceptable these days,
especially given that for tiny files (which is what people might process
on tiny systems) that's just address space rather than memory
allocation.  This memory is freed after loading is complete.

The change to POT_BUFFER_SIZE saves up to 1 second in this benchmark,
but it was helping a lot more before I disabled pot sync (it was still
way too slow).

The change to LOG_BUFFER_SIZE doesn't matter for this benchmark since I
used the -nolog option.

The addition of source() method for raw-md5 helps a lot.  Without it,
and without the copy-on-write avoidance in cracker.c, I couldn't run
8 processes on this machine without it getting into swap.  Perhaps we
should add source() to more formats.

Alexander

diff -urp bleeding-jumbo.orig/run/john.conf bleeding-jumbo-opt/run/john.conf
--- bleeding-jumbo.orig/run/john.conf	2015-09-14 12:19:49 +0000
+++ bleeding-jumbo-opt/run/john.conf	2015-09-15 01:17:10 +0000
@@ -26,9 +26,9 @@
 # Default wordlist file name (including in batch mode)
 Wordlist = $JOHN/password.lst
 # Use idle cycles only
-Idle = Y
+Idle = N
 # Crash recovery file saving delay in seconds
-Save = 60
+Save = 600
 # Beep when a password is found (who needs this anyway?)
 Beep = N
 # if set to Y then dynamic format will always work with bare hashes. Normally
@@ -72,7 +72,7 @@ TimeFormat = %Y-%m-%d %H:%M
 TimeFormat24 = %H:%M:%S
 
 # Set this to N to disable use of memory-mapping in wordlist mode.
-WordlistMemoryMap = Y
+WordlistMemoryMap = N
 
 # For single mode, load the full GECOS field (before splitting) as one
 # additional candidate. Normal behavior is to only load individual words
@@ -120,7 +120,7 @@ StatusShowCandidates = N
 LogCrackedPasswords = N
 
 # Disable the dupe checking when loading hashes. For testing purposes only!
-NoLoaderDupeCheck = N
+NoLoaderDupeCheck = Y
 
 # Default encoding for input files (ie. login/GECOS fields) and wordlists
 # etc.  If this is not set here and --encoding is not used either, the default
@@ -177,18 +177,18 @@ SecureMode = N
 # If set to Y, a session using --fork or MPI will signal to other nodes when
 # it has written cracks to the pot file, so they will re-sync. Note that this
 # may be delayed by buffers and the "Save" timer setting near top of this file.
-ReloadAtCrack = Y
+ReloadAtCrack = N
 
 # If set to Y, a session using --fork or MPI will signal to other nodes when
 # it has cracked all hashes (there's nothing more to do!). This is ignored
 # when ReloadAtCrack = Y because it's redundant.
-ReloadAtDone = Y
+ReloadAtDone = N
 
 # If set to Y, resync pot file when saving session. This does not involve any
 # signalling, we just detect that someone else wrote to the pot file.
 # This will sync with concurrent sessions even when not using --fork or MPI
 # but it may be delayed by the "Save" timer setting near top of this file.
-ReloadAtSave = Y
+ReloadAtSave = N
 
 # If this file exists, john will abort cleanly
 AbortFile = /var/run/john/abort
diff -urp bleeding-jumbo.orig/src/cracker.c bleeding-jumbo-opt/src/cracker.c
--- bleeding-jumbo.orig/src/cracker.c	2015-08-06 13:25:11 +0000
+++ bleeding-jumbo-opt/src/cracker.c	2015-09-15 02:53:29 +0000
@@ -32,6 +32,10 @@
 #include <io.h> // open()
 #endif
 
+#ifdef __SSE2__
+#include <emmintrin.h>
+#endif
+
 #include "arch.h"
 #include "misc.h"
 #include "math.h"
@@ -225,7 +229,7 @@ static void crk_remove_salt(struct db_sa
  */
 static void crk_remove_hash(struct db_salt *salt, struct db_password *pw)
 {
-	struct db_password **current;
+	struct db_password **start, **current;
 	int hash, count;
 
 	crk_db->password_count--;
@@ -251,15 +255,23 @@ static void crk_remove_hash(struct db_sa
 
 	hash = crk_db->format->methods.binary_hash[salt->hash_size](pw->binary);
 	count = 0;
-	current = &salt->hash[hash >> PASSWORD_HASH_SHR];
+	start = current = &salt->hash[hash >> PASSWORD_HASH_SHR];
 	do {
 		if (crk_db->format->methods.binary_hash[salt->hash_size]
 		    ((*current)->binary) == hash)
 			count++;
-		if (*current == pw)
+		if (*current == pw) {
+/*
+ * If we can, skip the write to hash table to avoid unnecessary page
+ * copy-on-write when running with "--fork".  We can do this when we're about
+ * to remove this entry from the bitmap, which we'd be checking first.
+ */
+			if (count == 1 && current == start && !pw->next_hash)
+				break;
 			*current = pw->next_hash;
-		else
+		} else {
 			current = &(*current)->next_hash;
+		}
 	} while (*current);
 
 	assert(count >= 1);
@@ -276,9 +288,10 @@ static void crk_remove_hash(struct db_sa
 /*
  * If there's a hash table for this salt, assume that the list is only used by
  * "single crack" mode, so mark the entry for removal by "single crack" mode
- * code in case that's what we're running, instead of traversing the list here.
+ * code if that's what we're running, instead of traversing the list here.
  */
-	pw->binary = NULL;
+	if (crk_guesses)
+		pw->binary = NULL;
 }
 
 /* Negative index is not counted/reported (got it from pot sync) */
@@ -743,11 +756,25 @@ static int crk_password_loop(struct db_s
 			}
 		} while ((pw = pw->next));
 	} else
-	for (index = 0; index < match; index++) {
-		int hash = salt->index(index);
-		if (salt->bitmap[hash / (sizeof(*salt->bitmap) * 8)] &
-		    (1U << (hash % (sizeof(*salt->bitmap) * 8)))) {
-			pw = salt->hash[hash >> PASSWORD_HASH_SHR];
+	for (index = 0; index < match; ) {
+		int slot, ahead, target = index + 64;
+		int h[64];
+		unsigned int *b[64];
+		if (target > match)
+			target = match;
+		for (slot = 0, ahead = index; ahead < target; slot++, ahead++) {
+			h[slot] = salt->index(ahead);
+			b[slot] = &salt->bitmap[h[slot] / (sizeof(*salt->bitmap) * 8)];
+#ifdef __SSE2__
+			_mm_prefetch((const char *)b[slot], _MM_HINT_NTA);
+//			_mm_prefetch((const char *)&salt->hash[h[slot] >> PASSWORD_HASH_SHR], _MM_HINT_NTA);
+#else
+			*(volatile unsigned int *)b[slot];
+#endif
+		}
+		for (slot = 0; index < target; slot++, index++)
+		if (*b[slot] & (1U << (h[slot] % (sizeof(*salt->bitmap) * 8)))) {
+			pw = salt->hash[h[slot] >> PASSWORD_HASH_SHR];
 			do {
 				if (crk_methods.cmp_one(pw->binary, index))
 				if (crk_methods.cmp_exact(crk_methods.source(
diff -urp bleeding-jumbo.orig/src/loader.c bleeding-jumbo-opt/src/loader.c
--- bleeding-jumbo.orig/src/loader.c	2015-09-03 04:37:15 +0000
+++ bleeding-jumbo-opt/src/loader.c	2015-09-14 23:15:40 +0000
@@ -198,8 +198,11 @@ static void ldr_init_password_hash(struc
 	int (*func)(void *binary);
 	int size = PASSWORD_HASH_SIZE_FOR_LDR;
 
-	if (size > 0 && mem_saving_level >= 2)
+	if (size >= 2 && mem_saving_level >= 2) {
 		size--;
+		if (mem_saving_level >= 3)
+			size--;
+	}
 
 	do {
 		func = db->format->methods.binary_hash[size];
diff -urp bleeding-jumbo.orig/src/params.h bleeding-jumbo-opt/src/params.h
--- bleeding-jumbo.orig/src/params.h	2015-06-28 10:29:22 +0000
+++ bleeding-jumbo-opt/src/params.h	2015-09-15 01:08:49 +0000
@@ -195,7 +195,7 @@
  * its own purposes.  This does not affect password cracking speed after the
  * loading is complete.
  */
-#define PASSWORD_HASH_SIZE_FOR_LDR	4
+#define PASSWORD_HASH_SIZE_FOR_LDR	5
 
 /*
  * Hash table sizes.  These may also be hardcoded into the hash functions.
@@ -348,8 +348,8 @@ extern int password_hash_thresholds[PASS
 /*
  * john.pot and log file buffer sizes, can be zero.
  */
-#define POT_BUFFER_SIZE			0x8000
-#define LOG_BUFFER_SIZE			0x8000
+#define POT_BUFFER_SIZE			0x100000
+#define LOG_BUFFER_SIZE			0x100000
 
 /*
  * Buffer size for path names.
diff -urp bleeding-jumbo.orig/src/rawMD5_fmt_plug.c bleeding-jumbo-opt/src/rawMD5_fmt_plug.c
--- bleeding-jumbo.orig/src/rawMD5_fmt_plug.c	2015-08-22 00:59:05 +0000
+++ bleeding-jumbo-opt/src/rawMD5_fmt_plug.c	2015-09-15 00:00:46 +0000
@@ -70,7 +70,7 @@ john_register_one(&fmt_rawMD5);
 #define CIPHERTEXT_LENGTH		32
 
 #define DIGEST_SIZE				16
-#define BINARY_SIZE				4
+#define BINARY_SIZE				16
 #define BINARY_ALIGN			4
 #define SALT_SIZE				0
 #define SALT_ALIGN				1
@@ -155,11 +155,8 @@ static int valid(char *ciphertext, struc
 		p += TAG_LENGTH;
 
 	q = p;
-	while (atoi16[ARCH_INDEX(*q)] != 0x7F) {
-		if (*q >= 'A' && *q <= 'F') /* support lowercase only */
-			return 0;
+	while (atoi16l[ARCH_INDEX(*q)] != 0x7F)
 		q++;
-	}
 	return !*q && q - p == CIPHERTEXT_LENGTH;
 }
 
@@ -215,6 +212,27 @@ static void *get_binary(char *ciphertext
 	return out;
 }
 
+static char *source(char *source, void *binary)
+{
+	static char out[TAG_LENGTH + CIPHERTEXT_LENGTH + 1] = FORMAT_TAG;
+	ARCH_WORD_32 b[4];
+	char *p;
+	int i, j;
+
+	memcpy(b, binary, sizeof(b));
+
+#if SIMD_COEF_32 && defined(REVERSE_STEPS)
+	b[0] += INIT_A;
+#endif
+
+	p = &out[TAG_LENGTH];
+	for (i = 0; i < 4; i++)
+		for (j = 0; j < 8; j++)
+			*p++ = itoa16[(b[i] >> ((j ^ 1) * 4)) & 0xf];
+
+	return out;
+}
+
 #ifdef SIMD_COEF_32
 static void set_key(char *_key, int index)
 {
@@ -438,7 +456,7 @@ struct fmt_main fmt_rawMD5 = {
 		get_binary,
 		fmt_default_salt,
 		{ NULL },
-		fmt_default_source,
+		source,
 		{
 			fmt_default_binary_hash_0,
 			fmt_default_binary_hash_1,

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ