john-dev - Re: additional memory usage questions, and the db

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120714202235.GA2579@openwall.com>
Date: Sun, 15 Jul 2012 00:22:35 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: additional memory usage questions, and the db_password structures

On Sat, Jul 14, 2012 at 02:34:19PM -0500, jfoug wrote:
> There are other areas of this which can have significant impacts on memory
> usage, for large count of input (i.e. reduce this overhead and improve
> scalability of JtR).
> 
>  
> 
> These 2 are the next_hash pointer, and words pointer.
> 
>  
> 
> For these 2, the words pointer is only used for single runs. So why not
> build a structure that is shorter than the 'original', OR mem_alloc_tiny the
> db_password items sizeof (list*) less bytes when running in non-single mode?

We're already doing that.  We also do it for the "login" pointer when
memory saving is enabled.  See pw_size in ldr_load_pw_line().

> If that information even used other than in a single run?

The words list is specific to single crack mode.

> As for the other pointer (the next_hash), couldn't that be moved outside of
> each object, and simply be an array of db_password pointers??  My
> understanding was this is used in salted only, and only when there are
> multiple salts, thus keeping the ability to walk the salts.  However, why
> does each and every data object require this pointer?  It may simply be a
> miss understanding on my part, of just what the next_hash is.

Yes, you seem to misunderstand what next_hash is.  BTW, I edited the
comment on it in my patch introducing source().

What we can in fact do is reduce this to 1 pointer.  We only use either
"next" or "next_hash" during cracking, not both at once.  See the big
if/else in crk_password_loop().  We use both in the loader, but we can
try to come up with an algorithm that would not require that.  At the
very least, one of these can be moved to the cold part of the structure.

> However, on a 64 bit system, these 2 pointers are 16 bytes PER candidate of
> memory used.  And for a non-salted (or single salt??) run, that is not
> single, this memory is fully wasted, I think.

This applies equally to saltless and salted hashes, regardless of salt
count.  The 16 bytes are per hash loaded.

Anyhow, there's no pressing need to work on these enhancements right
now, so I don't intend to.  I might revisit this after the end of
summer.  My work on prepare() and source() was not so much for us to
gain these enhancements now, but rather to allow for further changes
to the formats interface (unrelated to these changes, but actually
desirable before the end of summer) without having the core and jumbo
trees each deviate from a common base in their own directions.  With the
changes I got in, I hope this can be avoided - that is, I can experiment
with further formats interface changes in a revision of core (not to be
committed yet), whereas bleeding will use the same base interface.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.