Date: Wed, 30 Jun 2010 06:13:37 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: bitslice DES parallelization with OpenMP New best benchmark (dual Xeon X5460 3.16 GHz, under some unrelated load): Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 20889K c/s real, 2607K c/s virtual Only one salt: 5701K c/s real, 711814 c/s virtual That's over 87% efficiency for the multi-salt case (I say "over" considering that there was a bit of other load). guesses: 15 time: 0:00:00:36 c/s: 18655K trying: zntkzntk - zzzzzzzz This is john-1.7.6-omp-des-4, already uploaded to: http://openwall.info/wiki/john/patches On Wed, Jun 30, 2010 at 04:42:26AM +0400, Solar Designer wrote: > ... Changing DES_bs_mt from 8 to 96, I am getting a 1% to 2% slowdown on > an otherwise idle system, I was too quick to state that. I forgot that higher DES_bs_mt may also make it feasible to parallelize set_salt() and even cmp_all(). Taking care of that and increasing DES_bs_mt further to 192, I reclaimed the old speed and more on an almost idle system. On the Core i7 920 2.67 GHz system, I am now getting: Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 10174K c/s real, 1267K c/s virtual Only one salt: 4841K c/s real, 602923 c/s virtual That's 88% efficiency (of 11500K for 8 separate processes). To avoid wasting CPU time when an actual run is about to terminate - when it has fewer than a full chunk of candidate passwords yet to test - I also enhanced the "crypt bodies" to perform only the required number of loop iterations. With this, I am getting: host!solar:~/john/john-1.7.6-omp-des/run$ ./john -e=double --salts=-2 ~/john/pw-fake-unix Loaded 1458 password hashes with 1458 different salts (Traditional DES [128/128 BS SSE2-16]) simsim (u2671-des) [...] ssssss (u3087-des) guesses: 14 time: 0:00:00:03 c/s: 9873K trying: ajjgajjg - btslbtsl guesses: 14 time: 0:00:00:09 c/s: 10019K trying: btsmbtsm - debrdebr guesses: 14 time: 0:00:00:15 c/s: 10053K trying: eokyeoky - fyudfyud woofwoof (u1435-des) guesses: 15 time: 0:00:01:02 c/s: 10055K trying: wtaywtay - ydkdydkd guesses: 15 time: 0:00:01:08 c/s: 10004K trying: zntkzntk - zzzzzzzz So 10M c/s on the Core i7 is achieved in practice. On the dual Xeon, for which I included the new 20M benchmark at the start of this message, an actual run now does: host!solar:~/john$ ./john-omp-des-4 -e=double --salts=-2 pw-fake-unix Loaded 1458 password hashes with 1458 different salts (Traditional DES [128/128 BS SSE2-16]) simsim (u2671-des) cloclo (u2989-des) mimi (u3044-des) aaaa (u1638-des) xxxx (u845-des) aaaaaa (u156-des) jamjam (u2207-des) booboo (u171-des) bebe (u1731-des) gigi (u2082-des) cccccc (u982-des) jojo (u3027-des) lulu (u3034-des) ssssss (u3087-des) guesses: 14 time: 0:00:00:01 c/s: 19487K trying: ajjgajjg - btslbtsl guesses: 14 time: 0:00:00:06 c/s: 20544K trying: eokyeoky - fyudfyud guesses: 14 time: 0:00:00:16 c/s: 18641K trying: kdvwkdvw - lofblofb guesses: 14 time: 0:00:00:27 c/s: 18626K trying: snzgsnzg - tyiltyil woofwoof (u1435-des) guesses: 15 time: 0:00:00:36 c/s: 18655K trying: zntkzntk - zzzzzzzz As you can see, it actually exceeds 20M at times, but then goes below that because of the changing non-John load. Any feedback? Anyone to test this on other systems, with other versions of gcc (needs 4.2 or newer, but I only tested 4.5.0), etc? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.