Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 5 Sep 2009 21:11:29 -0500
From: "JimF" <jfoug@....net>
To: <john-users@...ts.openwall.com>
Subject: Re: Thoughts and questions on creation of a 'generic' MD5 hash set format (to handle 'all' of them)

Since the initial release, I have tightened up the coding for the 'generic' 
md5 module, and added some new functionality, and sped things up (and fixed 
a nasty 'bug' in the non-salted hashes, which caused them to behave in a 
'salted' way, slowing them way down).

One of the main speed up's was putting keys directly into the #1 input 
buffer instead of into an array of strings made to hold the keys.  This can 
ONLY be done under a certain set of circumstances, and the code which loads 
the function 'primatives' into the array checks for all of these condions. 
Doing this, allows us to not have an extra buffer load (and various other 
performance penalties).  Things like md5($s.$p) could never load the keys 
directly into the input, since they are not the first part of the string 
anyway.  This speedup, allows john to run md5($p) as fast (faster actually 
on some systems), than the original 'hand coded' raw-md5.   It also provides 
about a 10% speedup (or 5% if 2 md5's are done, 3.3% if 3 md5s done, etc) 
for the formats where this 'option' is valid.  About 1/2 of the formats 
worked with this improvement.

md5-gen is now working with bench testing (-test) and 
a -subformat=md5_gen(#) command line option has also been added.
Now, these work just fine:
john -test
john -test -format=md5-gen      (tests md5_gen(0) or md5($p) )
john -test -format=md5-gen -subformat=md5_gen(4)   (tests the OSC or 
md5($s.$p) )

The phpass code was also added to md5-gen. This was done by making 2 
primatives specifically for phpass (phpasssetup and phpasscrypt), and there 
were 3 different base md5-gen functions built.  When the md5-gen goes into 
phpass mode, it first attaches the 3 different base functions (salt, 
set_salt, salt_hash), and then runs.  The phpasscrypt function is simply:

void MD5GenBaseFunc__PHPassCrypt() {
  unsigned Lcount = 1<<atoi64[ARCH_INDEX(cursalt[8]);
  MD5GenBaseFunc__clean_input();
  MD5GenBaseFunc__append_salt();
  MD5GenBaseFunc__append_keys();
  MD5GenBaseFunc__crypt_to_input_raw();
  MD5GenBaseFunc__append_keys();
  while (--Lcount)  // note last crypt not done in this loop.
     MD5GenBaseFunc__crypt_to_input_raw_Overwrite_NoLen();
  // last crypt is done to the output buffer.
  MD5GenBaseFunc__crypt();
}

That function is written to use all 'primative' functions, just like it 
would if it would have been loaded as an array of function pointers.  The 
reason it was not, is there is no way in the current language to load 
variables, other than the count of keys being a global, and having 2 input 
'buffers' and 2 output buffers.  There also is no looping or logical 
'constraints' within the simple language, so building a function like this 
was my best bet.

NOW, once that was done (the phpass hashes working), I wanted to see just 
what would be involved in hooking the md5-gen code to be used by other 
*_fmt.c files which use md5. I started by seeing what was needed to get this 
to work for the existing phpass_fmt code.  I found that it was VERY easy. 
The code is written in C, but I simply had to step back a bit, and ask 
myself what would be required to do this in C++ (a language I MUCH prefer 
over C).  In building a building a set of polymorphic classes in C++, one 
method, is to build a good strong base class (like the md5-gen 'class'), and 
then derive classes from that, which define a small amount of code, but use 
most of what was already coded for in the base class.  Now, this is C where 
we do not have language help to do this, so What I did, is use the fmt_main 
structures to do this.  One of the key parts of the fmt_main is the function 
pointers.  I simply created a fmt_main for the phpass, and set it to have a 
init and valid function.  Within the init, I actually 'do' something.  In 
there, assign all of the other function pointers in the phpass fmt_main 
structure to point to the functions listed in the fmt_main of the md5-gen (I 
built a function in md5-gen-fmt.c which does this 'linkage' for you). 
Then, besides the valid, and init, there are 2 other functions which are 
'required' within the phpass_fmt code.  These are salt and binary.   Now, 
within valid, all of the 'original' validation code is there.  However, the 
'last' step I do, is to call valid from the md5-gen code.  However, I have 
to 'convert' from $P$9ssssssssxxxxxxxxxxxxxxxx format into 
md5_gen(17)xxxxxxxxxxxxxx$ssssssss9 format. I created a function that does 
this in a single sprintf step.   Now, within the salt and binary, the ONLY 
thing they do, is to convert from phpass syntax, into the md5_gen(17) 
syntax, and call the binary (or salt) function within md5-gen.

That is all.  The phpass_fmt.c file is now very thin.  It has a fmt_main 
structure, that fills out very little at compile time.  It then has a 
conversion function, and has a valid/init/salt/binary functions.   We could 
easily do this for most any md5() type format. Thus, running on a file full 
of 'raw-md5' hashes would work just fine (if raw-md5 'links' to md5-md5), 
and the end user will see no difference, however, the code acually DOING the 
work, is localized in the md5-gen-fmt.c file, and thus when new hardware is 
coded for, that is the ONLY area which needs to be tweaked.  So, when say 
CUDA gets added to md5-gen, then ALL formats that use those base objects 
will now be CUDA sped up, without having to spend the time to port each and 
every format.

I hope to have a -v2 of the generic md5 patch out this weekend, since it is 
a 3 day weekend.

Jim.



Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.