Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Date: Fri, 9 Sep 2011 13:06:02 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: Strange timings in pdf

Dhiru, 

 

Pdf has some 'strange' timings.  Multi-salt being slower than single salt.
Usually, this is the other way around.

 

The problem is that all the work is being done within set_salt() function.
Normally, this would be much better done in get_salt() function.

 

Here is why:

 

 

Init of john:

   Load each hash line.  Call get_salt() and store off the data this
function returns (the salt)

 

Running of john (word testing):

   Load the max allowable passwords for the format. Then for each salt that
was returned by the calls to get_salt (done during john loading), call
set_salt();crypt_all();cmp_all() over and over again, once for each salt
provided, for this/these block of password(s).

 

Thus, if there is 'work' to be done (on the salt only), it is best to try to
do this in get_salt, so that set_salt can be as fast as possible. This will
speed up the processing of the format (sometimes greatly speeding it up).
It does not matter how fast the get_salt() function is, within reason.  It
is only run 1 time against each salt. However, the set_salt function is run
one time against each salt, for each 'set' of passwords loaded.    Thus, if
your format processes 1 password at a time, and you have 10 salts, and run
against 10 billion candidate passwords, then set_salt is called 100 billion
times.  However, in this same run, the get_salt is only called 10 times at
john loading.  So, if the processing of the raw salt string takes .0000001s,
but assigning a pointer takes 0s, then having get_salt build this object
takes .0000001s, and the calls to set_salt take 0s (I know this is not true,
but bear with me).   However, if get salt simply returns the unprocessed
string, and set_salt is used to do the processing each time, then the amount
of time used in this faked up example is 100 billion * .0000001s, or 10k
seconds (about 3 hours).

 

 

The amount of code change required to get pdf_fmt into the 'proper' john
format, is not that much.   Likely simply have get_salt do the work being
done in set_salt, and instead of setting the 'static' data of the format, to
set the data in some allocated structure, then return the address of that
structure.  Then in set_salt, simply cast the void* to this marshalling
structure type, and copy the data from that structure, into the format's
static data. That way, you do not have to interpret the full string, doing
all the parsing each time.  However, the faster you can get this set_salt()
function to run, the better your 'multi-salt' times will be, and the faster
the format WILL run, when someone is trying to run it against a few hashes
at the same time.  The single salt timings will not matter one way or the
other.  For a single salt, the salt is loaded one time, and then the code
that does the password testing, simply loads candidates, then calls
crypt_all();cmp_all();  then loads the next batch of passwords.  

 

If done like this, I bet those reversed numbers will get back to the 'right'
order, and the multi salt test times will likely be 2x or 3x better than
they are today.

 

 

I am not making any changes myself, because I usually stay away from other
developers code, if they are likely to still be developing that code.
Another note, is that this format HAS had some changes to it (porting, and
now, a tiny change to the BENCHMARK_TIMING), so you may want to get to the
current version, prior to starting any coding changes.

 

Jim.

 

 


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.