john-dev - Re: Run-time change of a format's max length

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2c162866378975881cb362266609ab5f@smtp.hushmail.com>
Date: Fri, 14 Dec 2012 00:55:34 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Run-time change of a format's max length

On 13 Dec, 2012, at 23:47 , Frank Dittrich <frank_dittrich@...mail.com> wrote:
> On 12/13/2012 01:15 AM, Solar Designer wrote:
>> As to plaintext_length, I intend to add a flag that would indicate
>> whether truncation at this length occurs in actual use of the target
>> systems (that use hashes of this type) or whether it's a limitation of
>> the format as implemented in JtR.
> 
> Instead of using just a flag, I'd prefer the actual maximum length
> supported by any of the target systems or applications using this hash
> type, even if we just make use of this maximum length for
> --list=format-details...

I agree, except I the figures might never be right from the start (often we do not know) or it may become wrong over time (different versions). So it may just add confusion.

The least we can do is document what we know in source-code comments as detailed as possible (eg. "from empirical tests on version x.y", "per Acme Inc. spec for verision z, see http://link.acme.com" or "according to RFC 31337".

> But what if the target system counts the length in characters, not in
> bytes?

Good question. This is already a "problem" in some parts of John. The typical case is NT and other formats that internally use UCS-2 or UTF-16. The "real" maximum length (in John) for those formats, when SSE2 or GPU enabled, is 27 characters of UCS-2 (or worst case as little as 13 characters of UTF-16 with surrogates, but even I think that is academic).

As long as we use ASCII or an 8-bit codepage like ISO-8859-1 or CP1252 as input, the maximum length (which is really "maximum number of octets of input, still in the input encoding") stays at 27. But when we use UTF-8, it is bumped to 3*27 = 81 octets, because worst-case we'll need three octets of UTF-8 to build one character of Unicode.

Now, since we bumped it beyond 27, we must take great care to truncate correctly within the format (because we might as well get 81 bytes of ASCII, which is just too long). If we do that wrong, we get hideous bugs. Specifically, get_key() must return the plaintext truncated *exactly* the same way, or we'll end up with wrong stuff in john.pot. This is why we spent quite some time making the Test Suite mangle the --encoding options.

On this subject, I just now moved the Unicode conversion of ntlmv2-opencl to GPU, for reducing transfer size and offload CPU. But it has unintuitive consequences. The "3*27" hack won't work (it would negate the change, and worse). So for UTF-8 this means we can use at most 27 octets of input, which is only 9 characters, worst-case, of UTF-16. Also, if running a UTF-8 wordlist in Russian or Greek (where nearly all characters are non-ascii), the transfer size is not reduced at all compared to when the conversion took place on CPU.

But I digress. Where were we? LOL?

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.