Date: Sun, 18 Dec 2011 02:46:22 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: DES BS + OMP improvements, MPI direction On Sat, Dec 17, 2011 at 09:21:19AM -0700, RB wrote: > I've not been terribly active here lately, but certainly keep up with > the state of things and "evangelize" the use of JtR where appropriate. Wow. > That said, I wanted to congratulate and thank you guys for the recent > speed improvements. Two specific items of note: the key setup for DES > bitslice and the proliferation of OpenMP enablement for various > hashes. By itself, the DES improvement speeds up LM on benchmarks by > as much as 2x on one of my systems (W3565). It'd be nice if you add some benchmark results to: http://openwall.info/wiki/john/benchmarks > In concert with OpenMP, I > see between 120m and 180m c/s over four threads in live cracking > (using HT appears to dampen my specific performance), Yes, LM/OpenMP scales poorly. 4 threads is usually just 1.5 times faster than non-OpenMP. But all of these are very fast, almost as fast as dummy. You might want to use a non-OpenMP build, which should do something around 60M c/s per process on your system. With 8 processes (e.g. with MPI), you'll get around 250M c/s total. > with completion estimates in the 36-hour (!!!) range. Actually, 120M c/s would result in completion in 17 hours (for printable US-ASCII), so perhaps you're not actually getting that speed. > As bad as LM is, it's still > useful to those who have access to the SAM database. OMP is just a > huge help in general. I'm not seeing a linear increase in speed on my > systems, but it's enough to stick with 4/8/16-core machines and not > futz about trying to set up an MPI network any more. Like I said, LM/OpenMP scales very poorly - almost up to the point where it's unreasonable to use OpenMP. For other hash types, it's much better (90% efficiency as compared to multiple separate processes is common). > All that said, after reading through the OMP additions they doesn't > appear to be terribly invasive. I'm sure there's some necessary code > grooming prior to the seemingly small insertion of the pragma, but > overall it seems small. I've not looked at the MPI stuff since magnum > took that over (thanks!), but that type of positioning is exactly > where I'd have wanted to go eventually. Being partly lazy, are those > of you looking at the MPI implementation considering using that same > approach? Assuming the network latency and throughput don't > interfere, that could certainly help solve the scaling issues john-MPI > had at the sunset of my maintainership. Are you suggesting that we'd apply MPI at the same low level where we currently apply OpenMP? No, that doesn't sound like a good idea to me, except maybe for the slowest hash/cipher types only (BF, MSCash2, RAR, crypt when run against SHA-crypt), where it might be acceptable. This is not currently planned, though. Rather, I think we should be moving in the opposite direction: introducing easy-to-use parallelization at higher levels, where it can be more generic and more efficient. Somewhat similar to what the MPI support in -jumbo does, but without MPI. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.