john-users - Re: Dupes recognition based on internal representation of ciphertext?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BAY107-F3329960ED99752860677A3FDF20@phx.gbl>
Date: Wed, 15 Jun 2005 08:38:30 +0200
From: "Frank Dittrich" <frank_dittrich@...mail.com>
To: john-users@...ts.openwall.com
Subject: Re: Dupes recognition based on internal representation of ciphertext?

Hi,

>Arguably, the loader should be enhanced to also use internal
>representations when it avoids loading dupes(*) for cracking and when
>it displays cracked passwords.

there were no dupes in my sample password file.
Even if the internal hash representation or the canonical form
is the same, the "user names" differ.

Can you make the conversion of external hash representations into a
canonical form a (compile time) option?

Or, even better, a setting which depends on the ciphertext format?
This could be done by adding a new function pointer to fmt_main.

If it's a NULL pointer, each external hash representation will be
stored in john.pot, and "john --show" just compares the external
representations.
This will be the obvious choice for most ciphertext formats - those
without multiple external representations for a given hash.
It could even be used for ciphertext formats with very few or very
rarely occuring different external representations, or if calculating
a canonical form should be too run time consuming.

If it points to a real function which converts an external hash
representation into its canonical form, this function is called before
storing a cracked hash in john.pot, and with "john --show".
This function is used for ciphertext formats with possibly many external
representations, which are easily convertible into a canonical form.

Of course, this requires a minor adjustment for all implemented
ciphertext formats.

BTW, theoretically, the problem of multiple external representations
for an internal representation can also occur for the salts.
I'm sure such (somewhat broken) hash algorithms exist.
To take care of this possibility, the newly introduced function
should convert salt+hash into a canonical form.
This could be a representation which corresponds to the rules
checked by the ciphertext-specific valid() function (in this case,
john --test could also verify this conversion works properly).
Or, one which inserts a format-specific marker, to reduce the risk of
collisions among different ciphertext formats (and allow for easier
grepping john.pot).
I'm not sure which implementation is to be preferred.

No matter how the conversion into a canonical representation works,
older john.pot entries for hash algorithms with multiple external
representations probably need to be converted into their canonical
representation, otherwise this conversion would have to be done
each time you load john.pot to check for cracked password hashes.
Another symbolic link to john?
Some options to restrict the conversion to a particular format and
some sanity checks (call valid(), and double-check by re-computing the
password hash before converting the john.pot entry) would be good.

>Alternatively, the split() method for affected hash types should be
>enhanced to canonicalize the text representations.

I'm not sure whether this can be easily done for each password hash
algorithm.

>>Of course, for raw MD5 the problem can be avoided by just
>>translating all hashes to lower case.
>
>BTW, this is best done in split().

I thought of converting the password file instead.

>Meanwhile, you can use the trivial fix to cracker.c if you like: in
>the log_guess() function call, remove the "dupe ? NULL : ".

That's exactly what I did, but I wasn't sure whether this change
would affect any other hash algorithm.


Thanks for your reply.

Frank Dittrich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.