Date: Wed, 16 Sep 2015 02:29:17 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Judy array On Wed, Sep 16, 2015 at 02:05:30AM +0300, Solar Designer wrote: > First, I took the latest cracker.c. This got my prefetching disabled. > And, surprise, it's now 2 seconds slower (I still have SHR=0): > > real 0m58.804s > user 4m26.433s > sys 0m18.934s > > Maybe the prefetching is sometimes helping, after all, even though for > our current raw-md5 it's very limited (max_keys_per_crypt is only 12, so > those loops don't actually go to the hard-coded maximum of 64 > outstanding prefetches). Yes, it looks like prefetching started to help. Added "#define CRACKER_PREFETCH", got my 57 seconds back. real 0m57.781s user 4m16.614s sys 0m16.467s That's just a little over 1 second difference (or 2 seconds when comparing some other runs), but still a speedup. Tried uncommenting the hash table prefetch line (in addition to the normally used bitmap prefetch line), got a regression: real 1m3.312s user 5m4.454s sys 0m16.826s As a sanity check, tried commenting out both prefetch instructions, and... real 0m57.785s user 4m21.243s sys 0m15.629s Oops. Almost same speed without actual prefetching. So maybe the re-arranged code is somehow faster in this case, even when I don't prefetch. Or maybe the CPU does speculative execution and thus prefetches anyway, using those pointers that the following few loop iterations would use (especially as the compiler unrolls the loop). Another test, with: *(volatile unsigned int *)b[slot]; instead of the SSE2 prefetch. Got a regression: real 1m0.591s user 4m30.447s sys 0m15.558s So at least SSE2 prefetch doesn't hurt, unlike this. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.