john-users - Re: Re: Dupes recognition based on internal representation of ciphertext?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20050623005340.GA15805@openwall.com>
Date: Thu, 23 Jun 2005 04:53:40 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Re: Dupes recognition based on internal representation of ciphertext?

I wrote:
> >Arguably, the loader should be enhanced to also use internal
> >representations when it avoids loading dupes(*) for cracking and when
> >it displays cracked passwords.

On Wed, Jun 15, 2005 at 08:38:30AM +0200, Frank Dittrich wrote:
> there were no dupes in my sample password file.
> Even if the internal hash representation or the canonical form
> is the same, the "user names" differ.

I was not referring to your specific problem.  I was pointing out that
dupes handling in John needs to be made more consistent.

> Can you make the conversion of external hash representations into a
> canonical form a (compile time) option?

No, -- the John "core" operates on internal representation of hashes
anyway.  While I could make the dupes detection optionally use the
external (non-canonicalized) representation in all places (loader,
logger, etc.), I do not think that having this as an option is a good
idea.  It would be very confusing to most John users.  Even those few
who would think they know what this stuff is about would in fact likely
be wrong about some aspects of it.

This is one of the reasons why I've suggested a different long-term fix
in my previous response to you.  The idea is to have "john --show"
compare internal representations, and only fall back to simple string
comparisons for hash types that are not supported by the version of John
being used.

Nevertheless, I will comment on your proposed approach:

> Or, even better, a setting which depends on the ciphertext format?
> This could be done by adding a new function pointer to fmt_main.

We've got split() already.  This function may do more than just split
LM-like hashes into their halves, it may also canonicalize ASCII
representations of any hash types for their storage in john.pot.

> BTW, theoretically, the problem of multiple external representations
> for an internal representation can also occur for the salts.

It does occur in practice.  Even worse, for the traditional DES-based
crypt(3) hashes, the way invalid salt characters are treated is
implementation-specific.  I am aware of two kinds of implementations
(that is, two different mappings of invalid salt characters onto the
6-bit values); John supports only one of those (and enhancing it to
support both is not trivial).

John does not canonicalize the salts it stores into john.pot in any way,
but it does use internal representation when determining whether two
salts are different or the same.  So it will never waste CPU cycles
hashing candidate passwords against two different representations of the
same salt, and it will never load more than 4096 salts for traditional
DES-based crypt(3) hashes.  But the loader (and "john --show") might not
recognize that a password hash has already been cracked if the instance
stored in john.pot has another representation of the same salt.  Of
course, these occurrences (two hashes of the same plaintext password,
with the same salt, but with different representations of the salt) are
very rare.

> To take care of this possibility, the newly introduced function
> should convert salt+hash into a canonical form.

It'd be tricky to implement, but split() could do it.  In fact, it could
even produce multiple canonicalized hashes to cover the implementation
differences I've mentioned above.

> This could be a representation which corresponds to the rules
> checked by the ciphertext-specific valid() function

FWIW, right now, both pre-split() and post-split() strings are supposed
to be valid().

> Or, one which inserts a format-specific marker, to reduce the risk of
> collisions among different ciphertext formats (and allow for easier
> grepping john.pot).

Yes, split() does just that for LM hashes already.

> No matter how the conversion into a canonical representation works,
> older john.pot entries for hash algorithms with multiple external
> representations probably need to be converted into their canonical
> representation,

I disagree.

> otherwise this conversion would have to be done
> each time you load john.pot to check for cracked password hashes.

Old entries could continue to be compared in the old-fashioned way.
Also, the conversion should be cheap enough even if we would do it for
all entries on the fly.

> Another symbolic link to john?
> Some options to restrict the conversion to a particular format and
> some sanity checks (call valid(), and double-check by re-computing the
> password hash before converting the john.pot entry) would be good.

That's way too much complexity for the user.

> >Alternatively, the split() method for affected hash types should be
> >enhanced to canonicalize the text representations.
> 
> I'm not sure whether this can be easily done for each password hash
> algorithm.

There might exist some for which it would not be easy, but then it would
also be not easy to do within a new function.

> >>Of course, for raw MD5 the problem can be avoided by just
> >>translating all hashes to lower case.
> >
> >BTW, this is best done in split().
> 
> I thought of converting the password file instead.

I had guessed that, -- I've just suggested a better way to do it.

> >Meanwhile, you can use the trivial fix to cracker.c if you like: in
> >the log_guess() function call, remove the "dupe ? NULL : ".
> 
> That's exactly what I did, but I wasn't sure whether this change
> would affect any other hash algorithm.

It does.  But the only impact is the potential for duplicate entries in
john.pot, so that's OK for a private use hack.

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments

Was I helpful?  Please give your feedback here: http://rate.affero.net/solar
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.