john-dev - RE: allocating salts

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <020501cd39ad$b0357030$10a05090$@net>
Date: Thu, 24 May 2012 08:04:00 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: RE: allocating salts

>-----Original Message-----
>From: Dhiru Kholia [mailto:dhiru.kholia@...il.com]
>
>Which way of the following ways of handling salts is better / preferred.
>
>Given,
>
>static struct custom_salt {
>        unsigned char userid[8 + 1];
>        char unsigned hash[8];
>} *salt_struct;
>
>
>1. #define SALT_SIZE sizeof(*salt_struct) and malloc'ing custom_salt
>structure for every salt (involves a single pointer copy but leaks
>memory) in get_salt().
>
>2. #define SALT_SIZE sizeof(struct custom_salt) and have a static
>custom_salt structure (involves whole buffer copy but doesn't leak
>memory)

In the above case, I would simply set salt_size within the format to be
sizeof(struct custom_struct), use a static object (so it survives the return
call), and allow JtR to manage the memory.  JtR will present a pointer to
set_salt, and this pointer will be directly usable in this case.

The reason I did the alloc a salt 'struct', and then set the format's
salt_size to sizeof(struct*) and return the data for that pointer, is due to
the salts being extreme variable in size in the pkzip format.  JtR will
always allocate salt_size amount, and then passes in a pointer to what it
allocated.  However, the salt_size it wants is 'fixed', based upon what is
listed within the structure.  Thus, for pkzip, where I had an extreme
variable sized salt (the size of the compressed/crypted file, plus a little
more), there was no real way to tell JtR how much to allocate.  So in that
instance, the format was left in charge of proper memory allocation.  A
structure was created with allocated pointers, so that certain members could
be completely variable, and then JtR was only given back the contents of a
pointer.  It would then pass those bytes back in to the set_salt, where the
format could obtain the pointer to the originally allocated salt structure.

Now, for instance, if this was your salt:

static struct custom_salt {
        unsigned char userid[128 + 1];
        char unsigned hash[128];
} *salt_struct;

In this hash, the 'hash' value will vary from 32 to 128 bytes, and the
userid will be variable, from 1 to 129 bytes, then I think you will get
benefit from handling your own allocation, and returning the data of a
pointer from within get_salt();  So the structure would be like this:

static struct custom_salt {
	unsigned char *userid;
	unsigned char *hash;
	unsigned hash_len;
};

Then allocate the structure, and the proper sized userid and hash elements
within that structure, fill them in, and return the pointer data from
get_salt.  In that way, we only allocate what is needed, vs allocating 257
bytes for every salt.  It may be that your salt only takes 52 bytes, for one
hash.  The next one may take 256+, but you will allocate that much IFF it
was needed.


A slightly different method of get_salt()/set_salt() optimization can also
be used any time there is a lot of CPU time involved in computing the salt,
but this work can be PRE done.  Say, you have 4 operations done on a crypt.
There is a 10 byte salt, that gets an SHA1.  Then a user name that gets an
SHA1, then those 2 get appended and done and SHA1, then the hex16 of that
SHA1 result gets appended to the password, and a final SHA1 is done.
In this case, you would want to perform this work in get_salt, since it is
only done 1 time, at JtR load:  hex(SHA1(SHA1(salt).SHA1(UName))), and
return this 40 byte string (using a static buffer), from get_salt(), setting
the formats salt_size to 40 bytes.  Then within get_salt() JtR will simply
pass in that pre-computed hex string, you append it to the pass and do a
single SHA1, then JtR will call all compare functions.  In this case, you do
have a fixed sized salt (40 bytes), and have taken a 4 OP hash, and reduced
it to a strcat and a single OP.  Note, in this instance, the 'username' is
never passed into the get_salt() function.  Only the hash string.  So if the
user name was NOT part of the hash line, then you would need to build a
prepare() function, and append the user name to the hash string, so that it
will be available.

Again, each format needs to be looked at closely, on exactly what data is
presented, what operations are performed, what can be pre-computed, and what
it takes to best reduce both the memory footprint, and number of operations
used during the set_salt() and crypt_all() calls.  Often this becomes a
trade off for memory size, amount of precomputation that can be done, etc.
In the prior example, we 'could' have reduced the salt size to 20 bytes, and
performed the 'tohex' conversion within set_salt()  However, this 20 bytes
per salt, comes at the CPU cost of doing a 40 hex conversion each
password/salt.  You simply have to look at the format and best decide which
is best. If it is a fast format, then keep set_salt() doing absolutely
minimum.  If this hash was doing 1000 SHA1's in the crypt_all, then it would
likely be better to ONLY store the 20 byte binary hash as the salt, doing
the conversion within set_salt, since the memory saving make be more
important than the .001/s more cracks you might possibly get, by using a 40
byte buffer and precomputing hex.  But, if this was a fast hash (as
described), with only a single SHA1, you might get 5 to 10% improvement by
having that 40 byte hex string as the salt (and would have already gotten a
75% or more improvement, due to precomputing those 3 SHA1's in the get_salt)


Off topic to this question, but things I woule like to point out:

One other thing I would ask, is that you use JtR's memory functions from the
memory.c (vs strdup() and others).  I have some builds (such as VC), which
tell me of any unfreed memory upon exit. If you use strdup, (or calloc,
malloc, etc), knowing it will leak, these will all show as leaks in these
builds of mine.  If you use the memory.c functions of
mem_alloc_tiny()/mem_calloc_tiny() and  mem_alloc_copy() and
str_alloc_copy(), these actually use less memory, since they do not call the
allocator every time, but pack many calls into a single buffer, AND JtR will
clean up all of the memory these functions have allocated, prior to JtR
exiting, so that these memory allocations do not show up on any memory
leakage reports.

As a final item to some of your coding Dhiru, please declare all variables
at the top of blocks, and only at the top of blocks.  There are compilers
that when compiling C programs will only allow declarations at top of a
block, prior to any code.  Just because gcc allows a variable to be declared
at an arbitrary location in C compiles does not mean that all compilers
allow this.

Jim.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.