Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 19 Sep 2014 21:06:35 +0400
From: Solar Designer <>
Subject: Re: Restart work on mask mode

Hi Sayantan,

On Fri, Sep 19, 2014 at 08:16:58PM +0530, Sayantan Datta wrote:
> I am glad to inform that I'm resuming my work on mask mode and to begin
> with, I'm trying to find the bottleneck on cpu portion of mask mode which
> is quite slower compared to incremental mode.

There isn't really a single bottleneck there.  It's just slower code.

> To my surprise I can't get anything faster than 7.2 Mp/s(compared to 15Mp/s
> on inc mode)

I'm puzzled as to why the speeds are this low for you.  Here's what I am
getting on one core in FX-8120:

$ cat pw-dummy
$ ./john -inc -min-len=8 -max-len=8 pw-dummy 
Loaded 1 password hash (dummy [N/A])
Warning: no OpenMP support for this hash type, consider --fork=8
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:07 0.00% (ETA: 2017-08-24 19:32) 0g/s 62951Kp/s 62951Kc/s 62951KC/s kameto01..kamets99
0g 0:00:00:13 0.00% (ETA: 2018-01-11 20:55) 0g/s 63118Kp/s 63118Kc/s 63118KC/s pwektu10..pwekhrd2
Session aborted
$ ./john -mask='?a?a?a?a?a?a?a?a' pw-dummy 
Loaded 1 password hash (dummy [N/A])
Warning: no OpenMP support for this hash type, consider --fork=8
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:06 0.00% (ETA: 2022-01-06 13:12) 0g/s 24760Kp/s 24760Kc/s 24760KC/s    "+W'a..   "+W)L
0g 0:00:00:14 0.00% (ETA: 2023-02-13 10:53) 0g/s 25004Kp/s 25004Kc/s 25004KC/s    $<XYC..   $<X[.
Session aborted

As you can see, it's 63M vs. 25M, so still a substantial difference, but
both are much higher speeds than yours.

> even with this simple password generation loop:
> while(!crk_process_key("a"));

Are you saying this gives you only 7.2 Mp/s?  That's puzzling.  What
hash type?  Or are you using --stdout (if so, that's the bottleneck)?

> replacing the original code:
> while ((word = rpp_next(&ctx))) {
>         if (options.node_count) {
>             seq++;
>             if (their_words) {
>                 their_words--;
>                 continue;
>             }
>             if (--my_words == 0) {
>                 my_words =
>                     options.node_max - options.node_min + 1;
>                 their_words = options.node_count - my_words;
>             }
>         }
>         if (ext_filter(word))
>             if (crk_process_key(word))
>                 break;
>  }

As you can see, this calls rpp_next() to obtain each new candidate
password.  This differs from incremental mode's code, which updates its
own local buffer directly.  Further efficiency differences can be easily
seen if you compare what's inside rpp_next() vs. what happens in when
incremental mode only updates the last character's index.

Rather than look for "bottlenecks" here, we simply need to write mask
mode code that is at least as efficient as incremental mode's code.
It will need to stop using rpp.  My reuse of rpp for early mask mode
code was to demonstrate how it could be implemented.  rpp was never
meant to be used as performance-critical, so it wasn't optimized to that
extent (nor do I recommend optimizing it, as that would complicate it;
instead, I recommend writing new code specific to mask mode).

> Manually, for now,  I'm unable to find the bottleneck. So does anyone knows
> if there is any good cpu profiler for linux64 machines ?

To determine which functions most CPU time is spent in, gprof would work
fine.  To determine CPU instruction sequences taking excessive amounts
of time, you could use something like Intel's VTune.  However, I really
don't think this is what you need to optimize mask mode.  It is obvious
that rpp_next() is slower than inc.c's typical internal loop - there's
no need for a profiler for that.


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ