john-dev - RE: magnum-jumbo and magnum-bleeding (NOT J7), and the source() function

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <003501cd61ee$429c1b50$c7d451f0$@net>
Date: Sat, 14 Jul 2012 13:27:00 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: RE: magnum-jumbo and magnum-bleeding (NOT J7), and the source() function

>From: Solar Designer [mailto:solar@...nwall.com]
>
>On Sat, Jul 14, 2012 at 11:16:34AM -0500, jfoug wrote:
>> Lol, magnum and I worked through many of these items already, and on
>> jumbo, there are even MORE tricky situations than in core, but you
>> have pretty much reverted back to one of our earlier attempts that was
>> significantly more limited in usefulness than what we obtained in the
>end.
>
>I think that starting with a more limited interface and implementation
>and then enhancing it with separate changes/commits is fine.  As I wrote
>earlier, I did not particularly like some specifics of the source()
>interface you were using.  

Yes, there was a lot hackish about it, but 'simplifying' with individual
params is no better, IMHO.  Now, you have to keep adding params, and
changing the interface, vs updating a structure.  

Also, since this is individual params, if we make changes, required for
jumbo, this means we also have to change the CORE formats in jumbo,
different than what they are in core.  Jumbo having to live within the
limitations of core has been something that has caused some of the hackish
coding (like globals, etc) to have to be used in certain places within
jumbo.

>So I opted for the simpler thing first, with
>intent to enhance it in a slightly different way from what you did.
>
>Doesn't this simpler interface cover all use cases you currently have in
>bleeding?  I thought you were not using this for any salted hashes yet,
>were you?

The interface I had worked perfectly for salted formats. I have tested with
several dynamic formats (but only on testing, this was never committed).
SapG at least (and I think others), are salted formats using source
(get_source). 

Also well within dynamic, which in of itself, adds a LOT more requirements,
due to the format not knowing just WHAT is expected at compile time.

>Besides, with salted hashes we'd normally have a unique salt for almost
>every hash (unless the system generated salts poorly, which does
>happen), so we'd have a struct db_salt allocated anyway and thus the
>savings from not keeping the source string are relatively smaller.

That is why I stole a pointer. There was no salt reallocation. The original
salt did just what it did.  However, each hash 'knew' the salt, and always
had it available. This really turned out to be a key to getting the code for
get_source() to work well. Thus, there was absolute NO additional memory
used.  It was 100% memory savings.

>
>> I do like the hot/cold addition.  This keeps the hot part of binary
>> (which
>> source() requires to be fully available) in a much smaller working set
>> in the core loop, which was one shortcoming of my original version.
>
>Yes.  Note that I did not implement hot/cold yet, I merely considered it
>as a likely enhancement, thus only making changes that would be
>compatible with that enhancement later.  

This could be as large an improvement as the source() is in memory
reduction.  With source, the only using 4 bytes of binary enhancement was
lost.  This will regain it, but still keep from having to allocate the
source each time.

>What I implemented so far is
>"inline" storage of very small binary hashes (up to 64 bits in a 64-bit
>John build) by reusing the "char *source" pointer to store them in
>formats where we don't need that pointer for its original purpose.  This
>currently works for LM hashes and it should work for a few more in jumbo
>(old MySQL, anything else?), but it wouldn't work e.g. for NTLM.
>
>Hot/cold separation is still planned, to be used for hashes larger than
>the machine's word size.

I personally think this usage is more 'hackish' than using the same pointer
to point to different 'type' objects, depending upon runtime conditions.
Now you have a pointer that sometimes is not a pointer.  Oh well, your call.

>One idea is to revise struct db_password such that it would either have
>explicit inline space for the hot portion (perhaps a 32-bit field) or
>have the hot portion right before or right after it (a fixed offset
>relative to struct start either way).  

Something like this 'could' be done.  One way, is to make  an array, and
know that this hash list (salt) uses this buffer.  Then simply use the index
into the array, for salt count.   I think inlining into the db_password is
not good, because now it is not a packed array of 32 bit values, but an
array of db_passwords with a small (32 bit) part being in the hot.

>"Right before" might be better
>since we already have some fields "right after" - or rather, they're
>struct fields for which we don't always allocate memory (sometimes the
>allocation is smaller than the declared struct size).  The cold portion
>would need to be accessed through a pointer, which would be a struct
>field - or it can be an integer offset relative to the struct's address
>(so that we'd be able to make it 32-bit even in a 64-bit build), but
>that's tricky.  Maybe those "right after" fields I mentioned should be
>moved to the cold portion (then it won't be limited to being a portion
>of the ciphertext).
>
>> Ok, you have enhanced the interface in some areas, but reduced
>> functionality in other areas.  I just question if there is a better
>> way and interface to do this, but now, it is 'locked down', due to
>> being implanted in core, and
>
>There's almost always either a better way to do something or a tradeoff
>involved, or both.
>
>> thus jumbo will be hamstrung.
>
>I don't feel that my initial source() interface reduces the current
>functionality of bleeding in any way.  Does it?  Please be specific if
>so.

You can only do a subset of formats, non-salted. There is no thought into
salted formats at all.  Yes, that was NOT the easiest thing to get done.  It
took several versions, before getting it the way it was.  

It does improve things (albeit not implemented yet), in the hot/cold.  The
optimizations done prior to reduce the size of a hashes binary return, were
lost in the get_source.  Now, those optimizations are back in play, at least
after the hot/cold sees some coding.

I will wait on final thoughts, until I see how this gets inserted into
jumbo.  But I do think that salted formats, an likely dynamic may run into
issues working with the source() interface as is in core.

Jim.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.