Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 25 May 2015 10:32:07 +0200
From: magnum <john.magnum@...hmail.com>
To: john-users@...ts.openwall.com
Subject: Re: Bleeding jumbo now defaults to UTF-8

On 2015-05-24 06:17, Solar Designer wrote:
> On Fri, May 22, 2015 at 06:33:42PM +0200, magnum wrote:
>> On 2015-05-22 16:48, Marek Wrzosek wrote:
>>> That's a great news! What is the simplest way to "repair" all.lst from
>>> Openwall?
>>
>> I bet it's a mix of encodings so can't simply be converted.
>
> Yes.  And maybe it should stay as a mix of encodings despite of magnum's
> change, because quite often multiple encodings may possibly have been
> used in target passwords.

Yes, it might be relevant to keep one copy like that. Rockyou shows a 
real-world case where most of the hashes were UTF-8 but some were 
ISO-8859-1/CP1252 and a few were something else.

> I am worried that some lines are not valid UTF-8, though.

If used with the new defaults, a warning will be emitted and conversion 
will be truncated whenever 8-bit non-UTF8 is seen ("Möller" in 8859-1 
will become "M").

> How do we ensure those are tested against the hashes
> verbatim, like core (non-jumbo) JtR would test them?  Will this just
> happen that way despite of the recent change of default in jumbo?

If running with --enc=raw, the warnings will not be emitted and it will 
behave just like non-jumbo (at least in this regard). This is actually 
just an alias for --enc=ascii but the latter name might be confusing for 
this use.

> magnum, what do you suggest we do?  Simply assuming that e.g. md5crypt
> hashes are likely of UTF-8 plaintexts won't do.  Some of them might be,
> but some older ones might be iso-8859-1 or koi8-r or windows-1251 as
> well.  That's why current all.lst mixes all of these encodings together.

You would either run a mixed-codepage wordlist with --enc=raw (but just 
like core john, you won't get eg. case-flipping of 8-bit characters. 
Also, note that while this may be sensable for md5crypt, it isn't for 
NT, or any other hash that use UTF-16 internally).

Or you'd use UTF-8 wordlist(s) (perhaps some of the non-"all" ones) and 
specify a target encoding. This will work for NT et al too.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.