Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 17 Apr 2015 19:18:50 +0300
From: Alexander Cherepanov <ch3root@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Advice on proposal: John the Ripper jumbo robustness

On 17.04.2015 12:01, Kai Zhao wrote:
> Note: compile without asan and afl
>
> $ ./configure
> $ make
> $ echo garbage > test.pw
> $ time ../john --format=7z test.pw
> No password hashes loaded (see FAQ)
>
> real    0m0.041s
> user   0m0.038s
> sys     0m0.004s
>
> Calculate the invoked times and execution time of each function by gprof,
> attachment is the output file.
>
> The cfg_get_section() function occupies the most of time.

Thanks, that's much, much better.

> This is why
> it will get 7x speed-up when the john.conf is simple, such as "[Options]".
>
> It is interesting why the cfg_get_section() is called 16080 times. Most of
> the call is from the dynamic_IS_VALID() which is called 10000 times.
>
> We can optimize the dynamic_Register_formats() function which invokes
> 10000 times of dynamic_IS_VALID(). Below is part of the code:
>
> int dynamic_Register_formats(struct fmt_main **ptr)
> {
>      ...
>      for (count = i = 0; i < 5000; ++i) {
>          if (dynamic_IS_VALID(i, 1) == 1)
>              ++count;
>      }
>      // Ok, now we know how many formats we have.  Load them
>      pFmts = mem_alloc_tiny(sizeof(pFmts[0])*count, MEM_ALIGN_WORD);
>      for (idx = i = 0; i < 5000; ++i) {
>          if (dynamic_IS_VALID(i, 1) == 1) {
>              if (LoadOneFormat(i, &pFmts[idx]) == 0)
>                  --count;
>              else
>                 ++idx;
>          }
>      }
>      ...
> }
>
> The dynamic_Register_formats() function invokes 10000 times of
> cfg_get_section(), and every time cfg_get_section() tries to find the
> section from begin to the end which has lots of sections in current
> john.conf.

I see. Indeed, this approach is quite slow.

> An way to optimize the dynamic_Register_formats() function is to
> traverse all the sections and generates the result (whether valid) for
> every dynamic section. In this way, we will use little more memory but
> we reduce the 10000 times call to 1 time call. I think it speeds the john
> without change the config file and it is not only for fuzz testing.
>
> Do you agree with me? I am going to implement this change.

Yes, it would be nice to speed it up but it's not required.

In the past, I've tried to sort hashes by running john against every 
hash in a loop. It was slow. But this is quite exotic workflow.

For fuzzing, we can bypass these numerous cfg_get_section() calls. 
Either as Frank described, or by making an empty config file, or by 
indefining DYNAMIC_DISABLED.

So... If you can easily optimize it then go ahead. But don't spend much 
time on it. Then please got further and find where the next bottleneck 
is (with dynamics registering bypassed in any way). We are still at ~100 
exec/s. It would be nice to get 1000-2000 exec/s.

-- 
Alexander Cherepanov

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.