john-dev - Re: Latin-1 to UTF-16 conversion

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <0f8e7d9cd519cdceb426b31193a5b188@smtp.hushmail.com>
Date: Fri, 31 Jul 2015 23:49:06 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Latin-1 to UTF-16 conversion

On 2015-07-29 05:35, Lei Zhang wrote:
> I looked at set_key() in mssql05 and nt2, which both convert latin-1
> to utf-16 into SIMD key buffer. Yet there're still some details I
> don't understand.
>
> 1. mssql05 uses SHA1 and nt2 uses MD4, both of which use the same
> padding scheme, except for the endianness of the padded length at the
> tail of the block. But their code for converting are somehow
> different,
>
> e.g. in mssql05's set_key():
> 	*keybuf_word = JOHNSWAP((temp << 16) | temp2);
> and in nt2:
> 	temp2 |= (temp << 16);
> 	*keybuf_word = temp2;
>
> Why is there no endianness swapping in nt2?

MD4/5 are little endian.

> 2. In mssql05's set_key():
> 	unsigned int *keybuf_word = (unsigned int*)&saved_key[GETPOS(3, index)];
>
> What's the intention of the number 3 here? Salts are appended to
> message in mssql05, so this is not for preserving space for salt. And
> the salt size is not 3 anyway.

This is also due to byte order. We "mis-use" the GETPOS macro (which is 
char oriented) for 32-bit use. For MD4/5 it would be just GETPOS(0, index).

> BTW, there're so many hardcoded values in the code for SIMD buffer
> handling. This would cause a lot of headaches for a newcomer...

Yes, much of that code could need a touch of editing for clarity.

> 3. I see that the returned value in get_salt() and get_binary() are
> sometimes endianness-swapped for a SIMD build and sometimes not.
> What's the point here?

The naive approach would be to use a normal binary and a normal SHA 
function - that ends with byte-swapping each produced hash candidate.

A faster approach is obviously to byte-swap the binary when loading the 
hash (way out of all hot loops), then always skip the last byte-swap in 
the hash function. So we compare the candidate hashes in "wrong 
byte-order" but who cares, it ends up right. For SHA-1 this saves us 
about a hundred million byte-swaps per second, per core...

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.