Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 25 Aug 2009 22:35:58 -0500
From: "JimF" <>
To: <>
Subject: Re: Thoughts and questions on creation of a 'generic' MD5 hash set format (to handle 'all' of them)

----- Original Message ----- 
From: "Solar Designer" <>
To: <>
Sent: Thursday, August 20, 2009 11:51 AM
Subject: Re: [john-users] Thoughts and questions on creation of a 'generic' 
MD5 hash set format (to handle 'all' of them)

> Jim,
> You have some good thoughts here.  Thank you for posting this.

I see several 'parts' needing addressed.

1. I see, building some 'foundation' functions, and building them into a 
framework which john can use  (this email will talk about what I have done 
towards getting something like this working).

2. I see the need for some method of being able to provide said generic 
information to john.  Needed info would be things like the input file 
signatures, how to split up the salt(s), user id, the hash, and what base 
the 'hash' is in, if md5 is called upon itself, in what format the 
intermediates are in, etc.  I have ideas on how to do some of this, but will 
address those in a later email.

3. There is a need to build an initial 'pre-coded' working set, and have the 
ability to add totally new input types (expressions) without having to 
rebuild (i.e. a expression parser, code generator).  Again, for a later 

Ok here goes,

I have some generic 'foundation' functions working.  I have taken the 
original raw-MD5 code (i.e. all of the md5_cmp_all(), md5_cmp_exact() and 
md5_cmp_one() and made some changes to the loaders, and some salt parts. 
The code makes these assumptions:

1.  there is an array of keys and key lengths
2.  there is an array of input buffers and input buffer lengths.  (Note, 
there will likely need to be more than 1 input buffer set).
3.  there is an array of crypt_keys (output buffers, this is also what the 
'comparison' functions check when looking for hits).
4.  there is a global static variable count that is the count passed in by 
the gen_crypt_all(count) function.
5.  all of the foundation functions work with the above data.
6.  All of the data need not be of the same layout between the SSE and x86 
7.  All of the foundation functions will work 'the same' in SSE or x86 code 
looking from the 'outside'.  This rule 'could' be relaxed for optimization 
reasons, but it would require different parser/loader functions for x86 and 

I have then made these 'foundation' MD5 functions:

void MD5GenBaseFunc__reset_input()
void MD5GenBaseFunc__append_keys()
void MD5GenBaseFunc__append_salt()
void MD5GenBaseFunc__append_user()  // not done yet
void MD5GenBaseFunc__crypt()   // crypts all 'input_buffers' and places 
'raw' output into crypt_keys
void MD5GenBaseFunc__output_2_input_base16()   // appends a base-16 
finalized string
void MD5GenBaseFunc__output_2_input_base64()   // not done yet

With this, simply calling functions like this, in this order
void crypt_all(int count)
   MD5GenBaseFunc__output_2_input_base16() ;
will properly perform ALL needed tasks for processing md5(md5($p).$s)


would be all needed for md5($p.$s)

I have 'hard' coded the above format (the md5(md5($p).$s), prior to getting 
all of the foundation functions done.  The difference in speed from a fairly 
optimized hard coded version to the above version (calling the foundation 
functions), was not bad at all. It was pretty much a 'wash' for SSE, and a 
couple percent slower for x86.

However, the best part, is now we can simply have an array of function 
pointers, and set these to the proper foundation functions (with a NULL 
pointer to end), and then crypt_all()  function could be coded something 
like this, to do ALL forms of md5:

void crypt_all(int count)
  int i;
  for (i = 0; funcs[i]; ++i)  // note, not sure the syntax is right, but you 
get the point.

NOTE I think there are a couple of foundation functions that might also be 
good (for certain optimizations), things like multiple input buffers, 
multiple output buffers, allowing 'manipulations' to the lengths (so that 
certain 'appends' that would always be done at the same location, could be 
done once, to avoid mem movement), etc.  However, I think none of them would 
really be 'required', and I do not think they would help once a parser was 
built, that would fill the function pointer table from a given expression.


To unsubscribe, e-mail and reply
to the automated confirmation request that will be sent to you.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.