john-dev - RE: Question over validity of concept in JtR's benchmarking code.

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <023b01cc6f0d$6f1496a0$4d3dc3e0$@net>
Date: Fri, 9 Sep 2011 11:27:57 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: RE: Question over validity of concept in JtR's benchmarking code.

>From: Solar Designer [mailto:solar@...nwall.com]
>
>Jim -
>
> (I haven't considered your specific proposed changes yet.)
>
>I also would like to have benchmarking continue to test the functions
>for correct operation.  Right now, those extra seconds are wasted in
>terms of testing.  We're only using them to get the performance numbers.

This patch, does much better, and 'is' a proposal for changed behavior.

There is zero change to any existing format.  However, formats can now be
created which will 'smash' the existing passwords.

The change here, is if benchmark_length is less than -950, I then make
copies of all of the non null passwords for the format, and point each
format record to the new password, and smash the first letter of each word.
This is done right before the first call to bench_set_keys, in
benchmark_format.    Also, I add 1000 to benchmark_length.  So, -1000 will
be the 'same' as benchmark_length==0, but will smash the passwords.
benchmark_length=-1001 is the same as benchmark_length=-1, etc.

Then at the bottom of the format, if I had made these 'broken' copies, I
'fix' the passwords, and subtract 1000 from benchmark_length  This is so
when benchmark_format() is called the next time, things are setup
'properly'.


The possible side effect 'only' I can see from doing this is:

1. I use an array of 1024 char pointers, so if a format is built with more
test cases than this, we have problems.  We can add 'max count' checking
into the loop (and should), but this will still leave 'correct' passwords.
2. the plaintext values are changed from being in the CSEG (usually CPU
const), into the heap, thus are writable.
3. There is no guarantee that smashing a password, will not create a correct
one.  If one salt had a plaintext of password, and the next salt used
uassword, then the method of smashing would generate the 'correct' plaintext
for the 2nd salt from the 1st salts plaintext.


However, since bench is only called in benchmark testing, and john exists
after that, I do not think that #2 should be an issue.  For #3, this is
simply something that the format author would have to address.


So, if this patch is applied, and then BENCHMARK_TIME is set to -1001 in
pkzip_fmt, sha_fmt and pdf_fmt, then each of those would use 'smashed'
password, and bench like -1, while all of the rest of JtR would be
non-effected, and would use the constant strings from fmt_tst[x].plaintext,
i.e. the correct password.

It is very likely that we now 'can' change this, so that ALL formats use
'smashed' passwords, and there should be no degredation in speed which was
seen in my prior strcpy hack.  Likely there 'may' be a slight improvement on
SUPER fast formats, like NT, since the passwords are now actually closer
together in the physical memory than they were before.

But this -v2 'could' be used.   It is certainly better than the first
'hacked' version.



NOTE: I did look closer into the benchmark code, and it does this:

while (not done) {
  if (single_salt || keys_are_supposed_to_load)
    bench_set_keys()
  if (salts>1)
    fmt->set_salt(next_salt)
  fmt->crypt_all()
  fmt->cmp_all()
}

Thus john IS FULLY benchmarking exactly what the format actually does in
'normal' usage.  However, the cmp_all may be a little slower (possibly
noticeably slower, likely for NT) if we provide the wrong passwords, since
the cmp_all will likely have to test all items, which is exactly what it
would be doing on a 'real' run.  If all passwords are right, then it is very
possible, that cmp_all() would return true checking only the first item, and
if there are many items, this would cause it to be artificially returning
faster than in an actual run situation.  I have not checked into this
'theory' yet, but I still contend that we should be using wrong (but
'usable') passwords within the bench() loop.


Jim.


This patch is against the original bench.c  (1.7.8-jumbo-6).  The original
'hack' patch should be ignored.

diff -urpN john-1.7.8-jumbo-6a/src/bench.c john-1.7.8-jumbo-6b/src/bench.c
--- john-1.7.8-jumbo-6a/src/bench.c	2011-08-03 16:19:13.000000000 +0000
+++ john-1.7.8-jumbo-6b/src/bench.c	2011-09-09 15:50:56.375000000 +0000
@@ -114,6 +114,8 @@ char *benchmark_format(struct fmt_main *
 	static void *binary = NULL;
 	static int binary_size = 0;
 	static char s_error[64];
+	char *TmpPW[1024];
+	int pw_mangled=0;
 	char *where;
 	struct fmt_tests *current;
 	int cond;
@@ -163,8 +165,29 @@ char *benchmark_format(struct fmt_main *
 	if (format->params.benchmark_length > 0) {
 		cond = (salts == 1) ? 1 : -1;
 		salts = 1;
-	} else
+	} else {
 		cond = 0;
+		if (format->params.benchmark_length < -950) {
+			/* smash the passwords */
+			struct fmt_tests *current = format->params.tests;
+			int i=0;
+			pw_mangled = 1;
+			while (current->ciphertext) {
+				if (current->plaintext[0]) {
+					TmpPW[i] =
str_alloc_copy(current->plaintext);
+					TmpPW[i][0] ^= 5;
+					current->plaintext = TmpPW[i++];
+				}
+				++current;
+			}
+			/* -1001 turns into -1 and -1000 turns into 0 , and
-999 turns into 1 for benchmark length */
+			format->params.benchmark_length += 1000;
+			if (format->params.benchmark_length > 0) {
+				cond = (salts == 1) ? 1 : -1;
+				salts = 1;
+			}
+		}
+	}
 
 	bench_set_keys(format, current, cond);
 
@@ -239,6 +262,21 @@ char *benchmark_format(struct fmt_main *
 	for (index = 0; index < 2; index++)
 		MEM_FREE(two_salts[index]);
 
+	/* unsmash the passwords */
+	if (pw_mangled) {
+		struct fmt_tests *current = format->params.tests;
+		int i=0;
+		while (current->ciphertext) {
+			if (current->plaintext[0]) {
+				TmpPW[i][0] ^= 5;
+				current->plaintext = TmpPW[i++];
+			}
+			++current;
+		}
+		/* -1001 turns into -1 and -1000 turns into 0 , and -999
turns into 1 for benchmark length */
+		format->params.benchmark_length -= 1000;
+	}
+
 	return event_abort ? "" : NULL;
 }


Download attachment "bench-fixes-2.diff" of type "application/octet-stream" (1919 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.