Date: Tue, 5 May 2015 15:48:13 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: [GSoC] JtR SIMD support enhancements On Tue, May 05, 2015 at 07:30:03PM +0800, Lei Zhang wrote: > > On Apr 25, 2015, at 8:34 PM, Solar Designer <solar@...nwall.com> wrote: > > > >> Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (240xOMP) DONE > >> Raw: 17976 c/s real, 75.5 c/s virtual > > > > This is very poor speed. Needs to be investigated. > > Out of curiosity, I used Intel VTune to profile this self-test, and got a execution time distribution table: Thank you! I think we'll want you to try VTune on other formats as well, and to see lower-level bottlenecks as well (cache misses, etc.) (But for phpass you'd proceed with that later - need to deal with the immediate bottleneck first.) > Function CPU Time > ----------------------------------------- > [libiomp5.so] 248.451s > [vmlinux] 22.605s > [john] 6.882s > [libc-2.14.90.so] 0.627s > [libcrypto.so.10] 0.171s > [Others] 0.067s > > The program spends most of its on libiomp5.so, which I guess is where inter-threads synchronization happens. I think this poor speed results from the high synchronization overhead. Yes, it appears so. > Actually by setting OMP_NUM_THREADS to smaller values, I could get better results than the above. OK. The first thing I'd try is tuning OMP_SCALE. Perhaps the current value is way too low, resulting in the threads needing synchronization too often. Please experiment with it. Please stay at 240 threads. > It appears that the default OMP_NUM_THREADS=240 isn't optimal for MIC, as the synchronization overhead is too high. Maybe we should tune OMP_NUM_THREADS individually for each format, just like OMP_SCALE. There may be formats where we'd actually want to run fewer than 240 threads because of limited cache size, but phpass is almost certainly not one of those. All of those speeds you posted are ridiculously low. There's little point in going from 18k to 24k c/s when the target speed is 500k+ c/s. If md5crypt currently achieves 864932 c/s as you reported, then phpass at its $P$9 setting that we use for benchmarking should be no worse than half that speed, so at least 432466 c/s. And the target is higher. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.