Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 6 Mar 2013 02:36:31 +0100
From: magnum <>
Subject: Re: NetNTLMv1 and MSCHAPv2


On 7 Feb, 2013, at 5:02 , Solar Designer <> wrote:
> On Thu, Feb 07, 2013 at 07:56:01AM +0400, Solar Designer wrote:
>> As to speeding up NetNTLMv1 and MSCHAPv2 some more, if we care, we may
>> want to split the crypt_key[] array in two, one for 2-byte "hot"
>> portions of the output and the other for 14-byte "cold" portions (may
>> expand them to 16 bytes for faster index to offset calculation).
>> Allocating 21 or 22 bytes wastes cache space.  We only use 21 in
>> cmp_exact(), for one hash at a time - we could use a local variable
>> there instead.  I am not going to work on this now.  You may. :-)

The SIMD code path already separates nthash[] from crypt_key[] just for the sake of postponing stuff until needed. That is, we only copy the 2 hot bytes and then don't touch nthash[] until we reach cmp_one() [a.k.a thorough part of cmp_all()]. We currently do not take the opportunity to reduce size of crypt_key[] - the latter could be just the 2 hot bytes. I just tried this but it makes no difference on its own (actually slightly slower - wtf?).

> Taking this a step further, we could store just a few bytes of the
> 14-byte portion, and recompute the rest of the NT hash in cmp_exact()
> when we have to.

This might do more good for scalar code path than for SIMD. OTOH maybe it could make SIMD scale better in OMP?

> ... and if we store e.g. just last 4 bytes of each NT hash, then we can
> get away with using just one array, like we do now.  2 bytes of each
> element will be directly compared to the known values for the 3rd DES
> block key, and 2 bytes before those last 2 will be compared to results
> of DES encryption of the 2nd block, which are computed as needed.
> I think I like this option best.  Separating the hot/cold arrays has
> some cost of its own, and we can avoid that by simply making each
> element this small (only 4 bytes).  The index to offset calculation
> becomes trivial (and supported in x86's addressing modes natively).

Maybe I'm thick now but I don't follow. cmp_all() checks the known last two bytes of the NT hash. If they match (once in 64K), we calculate the first DES block and compare that. We need the first 7 bytes of the NT hash for this so 2+7 bytes would make it until cmp_exact(). How would we use just 2+2 bytes?


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.