Date: Wed, 30 Jun 2010 04:42:26 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: bitslice DES parallelization with OpenMP On Mon, Jun 28, 2010 at 02:44:02AM +0400, Solar Designer wrote: > (The OpenMP parallelization code for slower hashes that is integrated > into JtR 1.7.6 is a lot less sensitive to system load, likely because > the synchronization is less frequent. Maybe the same could be achieved > by buffering a lot more candidate passwords with these faster hashes - > beyond the current setting of 1024 - although the candidates would not > fit in L1 data cache then, likely resulting in a minor slowdown when > there's no other load.) I've just confirmed the above guess. Changing DES_bs_mt from 8 to 96, I am getting a 1% to 2% slowdown on an otherwise idle system, but roughly a 10% speedup on a system under significant non-John load. Here's a new benchmark for the same 8-core Xeon system that was doing up to 17.6M c/s before: Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE Many salts: 19255K c/s real, 2403K c/s virtual Only one salt: 5640K c/s real, 704143 c/s virtual Benchmarking: BSDI DES (x725) [128/128 BS SSE2-16]... DONE Many salts: 669148 c/s real, 83643 c/s virtual Only one salt: 530070 c/s real, 67080 c/s virtual The server load has stayed roughly the same - I can still reproduce the under-18M c/s best speed with the previous build. Here's a new test run: host!solar:~/john$ ./john-omp-des-3-96 -e=double --salts=-2 pw-fake-unix Loaded 1458 password hashes with 1458 different salts (Traditional DES [128/128 BS SSE2-16]) cloclo (u2989-des) mimi (u3044-des) aaaa (u1638-des) xxxx (u845-des) aaaaaa (u156-des) jamjam (u2207-des) booboo (u171-des) bebe (u1731-des) gigi (u2082-des) cccccc (u982-des) jojo (u3027-des) lulu (u3034-des) simsim (u2671-des) ssssss (u3087-des) guesses: 14 time: 0:00:00:01 c/s: 18158K trying: ajjgajjg - bbnvbbnv guesses: 14 time: 0:00:00:06 c/s: 17381K trying: debsdebs - dwghdwgh guesses: 14 time: 0:00:00:12 c/s: 18145K trying: ibiaibia - itmpitmp guesses: 14 time: 0:00:00:18 c/s: 17529K trying: lofclofc - mgjrmgjr guesses: 14 time: 0:00:00:27 c/s: 17059K trying: rdqardqa - rvuprvup guesses: 14 time: 0:00:00:34 c/s: 17017K trying: wawiwawi - wtaxwtax woofwoof (u1435-des) guesses: 15 time: 0:00:00:36 c/s: 16934K trying: xlfoxlfo - ydkdydkd guesses: 15 time: 0:00:00:40 c/s: 16972K trying: zntkzntk - zzzzzzzz In john.log, I am getting: 0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 12288 Previously, with chunks of "only" 1024 candidate passwords, this would complete in no less than 46 seconds: guesses: 15 time: 0:00:00:46 c/s: 14722K trying: zzwmzzwm - zzzzzzzz 1024 is the minimum for an 8-core system with SSE2 - all of those candidate passwords are actually "in flight" on the CPUs at the same time (8 cores computing 128 hashes each - in the 128 different bit positions). When the number of thread slots is increased beyond 8, there are still only 1024 of different candidates being processed by the CPUs at the same time (with 8 threads), but they don't always have to be from the same consecutive block of 1024 buffered ones - they can be scattered (to some extent) over the larger chunk. This lets the threads stay out of sync for a longer while (if they get out of sync for whatever reason), only joining them at a later time (less frequently). By that time, chances are that the relative time needed for the threads to get back in sync (as a fraction of the total time the threads ran) will be less. Also, the larger chunks may make it reasonable to use OMP_WAIT_POLICY=PASSIVE, which slows things down (at least on non-SMT CPUs) yet makes John a lot more friendly to other tasks. With smaller chunks - and more frequent switches between sequential and parallel processing - it was way too expensive to go for passive waits (which involve syscalls). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.