Openwall GNU/*/Linux 3.0 - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 28 Jun 2010 16:16:47 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: bitslice DES parallelization with OpenMP

Hi,

You can follow Openwall on Twitter now - announcements only, very low
volume - http://twitter.com/Openwall - and please retweet those whenever
you can. :-)  There are also several RSS feeds to choose from specified
on the John the Ripper homepage - http://www.openwall.com/john/ - a feed
from Twitter and several from Gmane.

Now to the actual stuff:

I've identified an oversight in the 1.7.6-omp-des-2 patch - all threads
were using the "ones" variable from thread slot 0, which was too close
to that slot's written-to fields (same page).  I could correct this to
use per-thread "ones", but doing so was tricky in the existing code,
where the "tp" pointer is local to the "crypt body" functions, whereas
the "ones" variable is needed by S-box functions (in practice they're
inlined into the "crypt bodies", but this does not help with variable
scope at the source code level).  So I've hacked around this differently
in 1.7.6-omp-des-3 - by keeping this variable separate and at a distance
from the thread slots.  Hopefully, the cache associativity is
sufficient, and it's at most 1 cache line to bounce anyway.  The new
patch is uploaded to the wiki:

http://openwall.info/wiki/john/patches

and I've also attached a diff against -omp-des-2 to this message.

Another change is setting "Idle = N" instead of the old default of
"Idle = Y".  This makes John less sensitive to other server load, but
unfortunately it also makes John less friendly to those other tasks.

The speed has improved - see below.

On Mon, Jun 28, 2010 at 02:44:02AM +0400, Solar Designer wrote:
> Next test system, Core i7 920 2.67 GHz (quad-core, 2 logical CPUs per
> core) with a little bit of other load (perhaps under 5%).  Normally, it
> does around 2600K c/s for multi-salt with one non-parallelized process,
> 11500K c/s combined with 8 separate processes.
> 
> Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
> Many salts:     9802K c/s real, 1237K c/s virtual
> Only one salt:  4683K c/s real, 585472 c/s virtual

Same system, new benchmark:

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     10025K c/s real, 1251K c/s virtual
Only one salt:  4786K c/s real, 597525 c/s virtual

This is due to the "ones" change only.  (The "Idle" setting has no
effect on "--test" benchmarks.  It only comes into play during actual
John runs.)

> guesses: 14  time: 0:00:00:34  c/s: 9491K  trying: maiemaie - mbvnmbvn

We're getting better actual speed now as well:

host!solar:~/john/john-1.7.6-omp-des-3/run$ ./john -e=double --salts=-2 ~/john/pw-fake-unix
Loaded 1458 password hashes with 1458 different salts (Traditional DES [128/128 BS SSE2-16])
mimi             (u3044-des)
aaaa             (u1638-des)
xxxx             (u845-des)
aaaaaa           (u156-des)
bebe             (u1731-des)
gigi             (u2082-des)
jojo             (u3027-des)
lulu             (u3034-des)
booboo           (u171-des)
cloclo           (u2989-des)
cccccc           (u982-des)
guesses: 11  time: 0:00:00:00  c/s: 9183K  trying: iciici - jprjpr
jamjam           (u2207-des)
simsim           (u2671-des)
ssssss           (u3087-des)
guesses: 14  time: 0:00:00:03  c/s: 9777K  trying: amkaamka - anxjanxj
guesses: 14  time: 0:00:00:08  c/s: 9884K  trying: clxcclxc - cnklcnkl
guesses: 14  time: 0:00:00:24  c/s: 9930K  trying: irzgirzg - itmpitmp
guesses: 14  time: 0:00:00:38  c/s: 9924K  trying: oejkoejk - ofwtofwt
woofwoof         (u1435-des)
guesses: 15  time: 0:00:01:09  c/s: 9838K  trying: zzwmzzwm - zzzzzzzz

> Finally, a faster system - dual Xeon X5460 (8 cores total) at 3.16 GHz,
> but with a bit more load (around 10%).  The benchmarks vary a lot:

The benchmarks still vary a lot (the load is changing), so I'll only
provide a test run output (under load).  Previously, I got:

> guesses: 15  time: 0:00:01:23  c/s: 8203K  trying: zzwmzzwm - zzzzzzzz

This time, things are a lot better:

host!solar:~/john$ ./john-omp-des-3 -e=double --salts=-2 pw-fake-unix
Loaded 1458 password hashes with 1458 different salts (Traditional DES [128/128 BS SSE2-16])
mimi             (u3044-des)
aaaa             (u1638-des)
xxxx             (u845-des)
aaaaaa           (u156-des)
bebe             (u1731-des)
gigi             (u2082-des)
jojo             (u3027-des)
lulu             (u3034-des)
booboo           (u171-des)
cloclo           (u2989-des)
cccccc           (u982-des)
jamjam           (u2207-des)
guesses: 12  time: 0:00:00:00  c/s: 16384K  trying: odwodw - prfprf
simsim           (u2671-des)
ssssss           (u3087-des)
guesses: 14  time: 0:00:00:03  c/s: 15406K  trying: bjcubjcu - bkqdbkqd
guesses: 14  time: 0:00:00:07  c/s: 14431K  trying: dlqqdlqq - dndzdndz
guesses: 14  time: 0:00:00:12  c/s: 14441K  trying: ghwmghwm - gjjvgjjv
guesses: 14  time: 0:00:00:17  c/s: 14526K  trying: jfpsjfps - jhdbjhdb
guesses: 14  time: 0:00:00:25  c/s: 14642K  trying: nyhwnyhw - nzvfnzvf
woofwoof         (u1435-des)
guesses: 15  time: 0:00:00:46  c/s: 14722K  trying: zzwmzzwm - zzzzzzzz

In this case, I think the speedup is mostly due to "Idle = N", which
makes context switches less frequent (when there's other demand for CPU
time) and thereby keeps the threads almost in sync more often.

Enjoy, and don't forget to provide your feedback.

Alexander

P.S. Setting DES_BS_EXPAND_MERGED to 1 provides some speedup for the
single-salt case (as expected, because the "expand" step gets
parallelized then), but it slows things down a bit for the multi-salt
case (at least on the Core i7 system).  The latter is unexpected - if
anyone can figure out the cause of the slowdown (or show that it
doesn't occur on other builds/systems maybe?), please let me know.

diff -urp john-1.7.6-omp-des-2/run/john.conf john-1.7.6-omp-des-3/run/john.conf
--- john-1.7.6-omp-des-2/run/john.conf	2010-02-26 01:22:09 +0000
+++ john-1.7.6-omp-des-3/run/john.conf	2010-06-28 11:23:46 +0000
@@ -7,7 +7,8 @@
 # Wordlist file name, to be used in batch mode
 Wordlist = $JOHN/password.lst
 # Use idle cycles only
-Idle = Y
+;Idle = Y
+Idle = N
 # Crash recovery file saving delay in seconds
 Save = 600
 # Beep when a password is found (who needs this anyway?)
diff -urp john-1.7.6-omp-des-2/src/DES_bs.c john-1.7.6-omp-des-3/src/DES_bs.c
--- john-1.7.6-omp-des-2/src/DES_bs.c	2010-06-27 17:10:52 +0000
+++ john-1.7.6-omp-des-3/src/DES_bs.c	2010-06-28 11:09:21 +0000
@@ -39,6 +39,7 @@
 #if !DES_BS_ASM
 DES_bs_combined CC_CACHE_ALIGN DES_bs_all[DES_bs_mt];
 #endif
+DES_bs_vector DES_bs_ones[0x1000];
 
 static unsigned char DES_LM_KP[56] = {
 	1, 2, 3, 4, 5, 6, 7,
@@ -138,6 +139,7 @@ for_each_t() {
 #elif defined(__MMX__) || defined(__SSE2__)
 	memset(tp->ones, -1, sizeof(tp->ones));
 #endif
+	memset(DES_bs_ones, -1, sizeof(DES_bs_ones));
 } /*t*/
 }
 
diff -urp john-1.7.6-omp-des-2/src/DES_bs.h john-1.7.6-omp-des-3/src/DES_bs.h
--- john-1.7.6-omp-des-2/src/DES_bs.h	2010-06-27 17:13:56 +0000
+++ john-1.7.6-omp-des-3/src/DES_bs.h	2010-06-28 11:11:27 +0000
@@ -72,6 +72,8 @@ typedef struct {
 	int gap[2]; /* XXX: SSE2 alignment hack for DES_bs_mt > 1 */
 } DES_bs_combined;
 
+extern DES_bs_vector DES_bs_ones[0x1000];
+
 #if DES_BS_ASM
 #define DES_bs_mt			1
 #else
diff -urp john-1.7.6-omp-des-2/src/DES_bs_b.c john-1.7.6-omp-des-3/src/DES_bs_b.c
--- john-1.7.6-omp-des-2/src/DES_bs_b.c	2010-06-27 17:07:24 +0000
+++ john-1.7.6-omp-des-3/src/DES_bs_b.c	2010-06-28 11:13:10 +0000
@@ -8,7 +8,7 @@
 #if !DES_BS_ASM
 #include "DES_bs.h"
 
-#define _ones ((vtype *)DES_bs_all[0].ones)
+#define _ones ((vtype *)&DES_bs_ones[0x800])
 
 #if defined(__ALTIVEC__) && DES_BS_DEPTH == 128
 #undef DES_BS_VECTOR
diff -urp john-1.7.6-omp-des-2/src/params.h john-1.7.6-omp-des-3/src/params.h
--- john-1.7.6-omp-des-2/src/params.h	2010-06-27 15:01:28 +0000
+++ john-1.7.6-omp-des-3/src/params.h	2010-06-28 11:25:37 +0000
@@ -15,7 +15,7 @@
 /*
  * John's version number.
  */
-#define JOHN_VERSION			"1.7.6-des-2"
+#define JOHN_VERSION			"1.7.6-des-3"
 
 /*
  * Notes to packagers of John for *BSD "ports", Linux distributions, etc.:

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux - Powered by OpenVZ