john-dev - Re: magnum-jumbo and magnum-bleeding (NOT J7), and the source() function

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120710112457.GB7788@openwall.com>
Date: Tue, 10 Jul 2012 15:24:57 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: magnum-jumbo and magnum-bleeding (NOT J7), and the source() function

On Mon, Jul 09, 2012 at 05:46:31PM -0500, jfoug wrote:
> I do think the interface for source() (or get_source(), or foo_bar() or
> whatever) is good.   Passing in the db_password structure pointer, and then
> 'using' the salt pointer in multiple ways, makes calling source() very easy
> to do, almost a direct replacement for the original pw->source pointer, AND
> saves memory, in not wasting space even for another pointer.

My tentative plan is to reuse the pw->source pointer for the "cold"
portion of the binary, when we do introduce the hot/cold split.  I do
not see why you need it for storing the salt.  I think that source()
should accept "void *binary" and "void *salt" instead of "struct
db_password *".  Yes, this is two parameters to pass instead of one,
but so what.  We're not calling source() from so many places that the
extra parameter would make the source code much less readable.  In fact,
preserving the separation between a "format" and a "database" is good.
I might deviate from this separation in another place, but that's for a
specific good reason, and it'd be an exception that proves the rule. ;-)
I don't see a good reason like that for source().

> Also, passing
> in the buffer is pretty much needed, since that allows this function to be
> called from multi-threaded code.  I am not 'sure' it will be called from MT
> code, but it certainly could be.

This could equally apply to all other format methods.  I think we
shouldn't single source() out just because we're introducing it much
later.  I don't see any immediate need to call source() from MT code
(and outside of a critical section).  When a password gets cracked, we
probably need a critical section anyway (to avoid races in updating the
database) and we don't care about performance of that piece much (we do
care a little bit, so the old full hash table rebuild code was no good).

So I was going to use a static buffer (or callee-allocated on the first
invocation if that's what an implementation prefers to do) just like
binary() and salt() usually do.

On the other hand, this may waste more memory across a large number of
formats than a single LINE_BUFFER_SIZE buffer in the caller does
(besides, that one is usually on stack, so it only exists while the
caller function runs).

If we really want to make things MT-safe, we'd need to do it for
prepare(), split(), binary(), salt() as well, but that will be a lot of
currently unneeded changes in jumbo (given the number of formats).

We could even take this as far as making the formats interface 100%
MT-safe, by introducing context structures similar to OpenSSL's.  That
would likely have performance impact in some cases, though.

So I'm not sure.

For some other format interfaces, yes, I do have thoughts on making them
MT-safe - e.g., this makes sense for set_key() (an FMT_* flag may say
so), or/and we may have a set_block(), where the caller may use a block
(set of key indices) per thread, and other calls such as set_key(),
set_salt(), crypt_all(), cmp_*(), get_hash*() would need to be MT-safe
as long as they're made for different block IDs.

Such support for blocks is also desirable for async crypt_all(), which
in turn is desirable to more cleanly support overlapping transfers of
keys to GPU with computation on GPU.  Currently, this is implementable
non-cleanly with a check for index == max_keys_per_crypt/2 in set_key().

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.