john-users - Re: problems with umlauts in charset-files

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060328152420.GA10181@openwall.com>
Date: Tue, 28 Mar 2006 19:24:20 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: problems with umlauts in charset-files

Frank has already provided the correct answers (thanks!), but I'll
comment on this anyway:

On Tue, Mar 28, 2006 at 09:26:18AM +0200, Uwe Danz wrote:
> I tried to generare my own char set: ?german.chr?
> But it was not possible to add umlauts to this file.
> 
> john@...ux:~/john-1.7.0.2/run> cat john.pot
> :abc???
> :abc?
> :dddd dddd
> john@...ux:~/john-1.7.0.2/run> ./john --make-charset=german.chr
> Loaded 1 plaintext
> Generating char sets... 1 2 3 4 5 6 7 8 DONE
> Generating cracking order... DONE
> Successfully written char set file: german.chr (2 characters)
> 
> But it seems that umlauts and a special german "s" can not be part of the
> *.chr file.

They can - after you adjust the CHARSET_* settings in params.h and
rebuild John.  But you should only want to do that if you have enough
statistical information on those characters - perhaps thousands of
passwords with those characters in your john.pot.

If you don't have the statistical information, you may be better off
using "single crack" and wordlist mode rules with umlauts, as well as an
external mode to catch any really short passwords.  Please refer to this
older john-users posting on the support for 8-bit characters with each
of the four cracking modes:

	http://article.gmane.org/gmane.comp.security.openwall.john.user/414

This includes the actual code for the "8bit" external mode.

Of course, you should also continue using the "incremental" mode with
the supplied all.chr in order to catch those weak passwords which don't
contain any 8-bit characters.

> And much worse - lines with valid passwords and containing
> special characters (e.g. umlauts) will be completely ignored. (in my example
> only the last line was parsed)

I disagree with you that this behavior is any "worse" than any other I
could have implemented (short of actually supporting arbitrary
characters with the default build of John).  Really, what else could
John do?  Just skip the unsupported characters, as if those passwords
were shorter?  That would result in incorrect estimated conditional
probabilities for the supported characters (they would be applied to
wrong password lengths, character positions, and adjacent characters),
in turn resulting in a worse success rate.  Replace those unsupported
characters with supported ones, e.g. with spaces?  That would keep
password lengths and character positions correct, but it would result in
sequences of characters that are not seen in the sample passwords being
erroneously considered "likely".  Additionally, even the beginnings of
sample passwords which do not contain the unsupported characters (that
is, the string "abc" in your example) are undesirable for "inclusion" in
the .chr file because they reflect probabilities of certain substrings
in passwords of a specific kind - e.g., German words with umlauts - but
complete passwords of this kind wouldn't be tried.

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments

Was I helpful?  Please give your feedback here: http://rate.affero.net/solar
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.