john-users - Re: Incremental mode progress and ETA

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D64893D.8050101@bredband.net>
Date: Wed, 23 Feb 2011 05:12:45 +0100
From: magnum <rawsmooth@...dband.net>
To: john-users@...ts.openwall.com
Subject: Re: Incremental mode progress and ETA

On 02/23/2011 01:11 AM, Solar Designer wrote:
> On Wed, Feb 23, 2011 at 12:50:09AM +0100, magnum wrote:
>> As a spinoff of recent experiments with better MPI splitting, I ended up
>> writing a progress (percentage) and ETA patch for Incremental mode.
>>
>> It is dead simple and works just fine but I'd like to improve it if
>> possible. It calculates the total number of possible candidates from
>> real_count and this seem to end up with a valid figure in all
>> situations. So how many have we tried? The current solution is just a
>> 64-bit counter that increments for every candidate produced.
>
> The problem here is that you have to store that counter in .rec, which
> is a change of the file format.  I am going to make such a change anyway
> (need that counter for other reporting to be added anyway), but along
> with reworking the incremental mode, not on its own.

Great, is that planned soon, like in 1.7.7 or more like in some distant 
future?

>> Now to the actual question: Is there by any chance some kind of
>> reasonably light formula that could be used to calculate this number
>> instead, from the various parameters and variables already used in the
>> process?
>
> Maybe, but like I said I am going to add the counter anyway, for all
> modes at once - but along with the incremental mode rework to avoid
> changing the .rec file format twice.

Understood. Until this happens, calculating it when needed would 
mitigate the .rec file problem.

>> Or is that question the reason we don't already have this
>> feature?
>
> No, that's not the reason.  The real reason/excuse is in the FAQ.

Yes... what year did you write that? ;-)  While your argument is true 
for -inc:all, I can easily run -inc:alpha or even -inc:alnum to 
completion against many types of hashes if there aren't too many salts. 
Especially using OMP or MPI.

>> Unless there is such a formula, with all cool patches pouring in now I
>> think I'll release it as-is, not just for MPI but as a separate patch.
>
> I am concerned that this might result in people running into issues with
> restoring sessions.  In particular, what if one wants to go "back" to a
> version of John without your patch (maybe a newer version of John even,
> just without the patch)?  Will they not be able to restore the session
> because of your .rec file format change?

I'll keep it private for now then. There are no current problems going 
back and forth (my patch deals with it and the unpatched version never 
reads the extra line) but future changes will make it break stuff of 
course (or at least you'd be forced to deal with my line for life).

>> We could have a Makefile feature for disabling it in case it ends up
>> hitting performance in some environments. On my gear it doesn't seem to
>> have any measurable impact on performance even with the very fastest
>> hashes, in fact it sometimes ends up faster.
>
> That's curious.  Perhaps you did the smart thing of not incrementing the
> counter for every candidate password, but updating it once per crypt_all()?

No, the current version is a pure inc.c thing - I do it right before these:

     if (!ext_mode || !f_filter || ext_filter_body(key_i, key = key_e))
     if (crk_process_key(key)) return 1;

I did expect a measurable hit when I first tried it, but nothing! I just 
rechecked it with fresh compiles. Patched or not, raw-md5 inc:digits run 
to completion in 25 seconds with the very same c/s reported. The slight 
speedup happened in certain cases with DES, maybe no more weird than the 
OMP4/OMP7 mystery :)

>> However there is another advantage with calculating it from already used
>> data: we could finally get to know how far that low-prio job we've been
>> running for ages have progressed (my current patch disables progress
>> output unless the counter was saved in an extra line in the .rec file).
>> Come to think of it, if there is an easy-but-pretty-slow way of
>> calculating it, I could keep the counter and only use the formula when
>> restoring an old job that lacks the counter.
>
> This makes sense.  Yes, you could implement it.  Sorry, I am not in the
> mood of producing pseudo-code (or C code) for you now.  I feel that I
> have other priorities.

I fully understand that. I don't expect being able to reverse-engineer 
it though.

Does anyone else in the community here have some clues?

Thanks,
magnum
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.