Date: Wed, 6 Mar 2013 06:16:03 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: NetNTLMv1 and MSCHAPv2 magnum - I am replying without looking at the code, so it is possible that I recall something incorrectly. On Wed, Mar 06, 2013 at 02:36:31AM +0100, magnum wrote: > The SIMD code path already separates nthash from crypt_key just for the sake of postponing stuff until needed. That is, we only copy the 2 hot bytes and then don't touch nthash until we reach cmp_one() [a.k.a thorough part of cmp_all()]. We currently do not take the opportunity to reduce size of crypt_key - the latter could be just the 2 hot bytes. I just tried this but it makes no difference on its own (actually slightly slower - wtf?). I don't know why you got a slowdown, but in general for optimal cache usage we need to keep the size of hot arrays to a minimum (in bytes) - don't have hot and cold data intermixed. When the hot array's elements are smaller, we may increase max_key_per_crypt while still fitting in L1 data cache. This may provide some speedup. (On the other hand, we should not bring max_key_per_crypt too close to the bitmap's size, which is fixed at 64K bits, as that would make the bitmap ineffective.) > On 7 Feb, 2013, at 5:02 , Solar Designer <solar@...nwall.com> wrote: > > Taking this a step further, we could store just a few bytes of the > > 14-byte portion, and recompute the rest of the NT hash in cmp_exact() > > when we have to. > > This might do more good for scalar code path than for SIMD. Too much to recompute with SIMD? With scalar code, we'd recompute 1 hash per 64K; with SIMD we'd recompute e.g. 12 per 64K (if we have no scalar NTLM code in that build). This is still a negligible percentage, so performance will depend on other factors (changes in memory layout, etc.) I agree that it feels wasteful to compute e.g. 12 hashes when we only need 1. > OTOH maybe it could make SIMD scale better in OMP? Hardly. > > ... and if we store e.g. just last 4 bytes of each NT hash, then we can > > get away with using just one array, like we do now. 2 bytes of each > > element will be directly compared to the known values for the 3rd DES > > block key, and 2 bytes before those last 2 will be compared to results > > of DES encryption of the 2nd block, which are computed as needed. > > I think I like this option best. Separating the hot/cold arrays has > > some cost of its own, and we can avoid that by simply making each > > element this small (only 4 bytes). The index to offset calculation > > becomes trivial (and supported in x86's addressing modes natively). > > Maybe I'm thick now but I don't follow. cmp_all() checks the known last two bytes of the NT hash. If they match (once in 64K), we calculate the first DES block and compare that. We need the first 7 bytes of the NT hash for this so 2+7 bytes would make it until cmp_exact(). How would we use just 2+2 bytes? I think I was wrong, and you're right, but I'm not sure what exactly I was thinking when I wrote the above a month ago. ;-) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.