Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 5 May 2015 15:48:13 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: [GSoC] JtR SIMD support enhancements

On Tue, May 05, 2015 at 07:30:03PM +0800, Lei Zhang wrote:
> > On Apr 25, 2015, at 8:34 PM, Solar Designer <solar@...nwall.com> wrote:
> > 
> >> Benchmarking: phpass ($P$9) [phpass ($P$ or $H$) 128/128 MIC 16x1]... (240xOMP) DONE
> >> Raw:	17976 c/s real, 75.5 c/s virtual
> > 
> > This is very poor speed.  Needs to be investigated.
> 
> Out of curiosity, I used Intel VTune to profile this self-test, and got a execution time distribution table:

Thank you!  I think we'll want you to try VTune on other formats as
well, and to see lower-level bottlenecks as well (cache misses, etc.)
(But for phpass you'd proceed with that later - need to deal with the
immediate bottleneck first.)

> Function			CPU Time
> -----------------------------------------
> [libiomp5.so]		248.451s
> [vmlinux]			22.605s
> [john]			6.882s
> [libc-2.14.90.so]	0.627s
> [libcrypto.so.10]	0.171s
> [Others]			0.067s
> 
> The program spends most of its on libiomp5.so, which I guess is where inter-threads synchronization happens. I think this poor speed results from the high synchronization overhead. 

Yes, it appears so.

> Actually by setting OMP_NUM_THREADS to smaller values, I could get better results than the above.

OK.  The first thing I'd try is tuning OMP_SCALE.  Perhaps the current
value is way too low, resulting in the threads needing synchronization
too often.  Please experiment with it.  Please stay at 240 threads.

> It appears that the default OMP_NUM_THREADS=240 isn't optimal for MIC, as the synchronization overhead is too high. Maybe we should tune OMP_NUM_THREADS individually for each format, just like OMP_SCALE.

There may be formats where we'd actually want to run fewer than 240
threads because of limited cache size, but phpass is almost certainly
not one of those.  All of those speeds you posted are ridiculously low.
There's little point in going from 18k to 24k c/s when the target speed
is 500k+ c/s.  If md5crypt currently achieves 864932 c/s as you
reported, then phpass at its $P$9 setting that we use for benchmarking
should be no worse than half that speed, so at least 432466 c/s.  And
the target is higher.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.