john-dev - Re: Reload pot file

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cd0de2261299e6f6b049c58ca37bac97@smtp.hushmail.com>
Date: Mon, 03 Mar 2014 05:08:01 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Reload pot file

On 2014-03-03 03:42, Solar Designer wrote:
>> A faster and safer solution would be to just re-process pot file using
>> existing functions. We miss the opportunity to reload the input files
>> containing hashes to crack but that was never my main goal anyway. The
>> worst problem seems to be the database used during initial load is not
>> exactly the same as the one ultimately used. Perhaps that doesn't
>> necessarily matter?
>
> I don't understand what you mean by "the database used during initial
> load is not exactly the same as the one ultimately used".  Can you
> please clarify which aspect(s) you're referring to here?  I'd like to
> comment on this, but as it is I am just confused.

This comment (I don't fully understand the implications yet):

struct db_password {
	...
/* After loading is completed: pointer to next password hash with
  * the same salt and hash-of-hash.
  * While loading: pointer to next password hash with the same
  * hash-of-hash. */
	struct db_password *next_hash;
	...

And these:

struct db_main {
	...
/* Salt and password hash tables, used while loading */
	struct db_salt **salt_hash;
	struct db_password **password_hash;
	...

The latter two tables are freed in ldr_fix_database. I thought I'd need 
to re-alloc and rebuild them (from the running DB) before trying to call 
the existing ldr_load_pot_file(). And then call fix_database again 
afterwards. Does that make sense?


A different approach - and maybe quicker unless the above is simpler 
than I imagine - would be to do it more like cracker.c does when 
cmp_exact() returns true. I'd need to process the "hash:plain" into 
binaries, salts, sources and plains as if it came from a running format 
after a crack loop. This might be simpler but I haven't thought it 
through yet.


> Re-reading the pot is more generic since it also supports MPI and
> independent invocations of john (e.g., if someone manually invokes john
> with multiple wordlists one after another while also running it in
> incremental mode, the incremental run's john would remove the hashes
> cracked by the wordlist runs).  So even if the shared memory approach
> above would happen to work well for --fork, I realize there may be
> demand for re-reading the pot anyway.  So you may implement that in
> jumbo, and leave the shared memory for me to eventually experiment with,
> or you may try the shared memory thing yourself if you like.

The shared memory stuff sounds cool but I'll leave that to you. OTOH I 
have some plans for trying to actually *use* MPI a little more - and see 
if we can do something cool without harming performance. But I think 
nothing could obsolete a generic reload feature.


> Oh, and when you re-read, you can start reading from a previously
> recorded offset (the last re-reads pot file size).  Then it may actually
> be fast.

Right, so it will be more a re-sync than a reload. This will be trivial 
once the non trivial stuff is taken care of %-)  There are variants of 
this - for example, once we write a new entry to the pot file, we can 
detect that someone else wrote inbetween so we can trigger a resync 
under certain conditions.

magnum
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.