Date: Mon, 16 Jan 2012 18:49:09 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: bitmaps On Mon, Jan 16, 2012 at 06:25:26PM +0400, Solar Designer wrote: > On Mon, Jan 16, 2012 at 03:00:35AM +0400, Solar Designer wrote: > > hashes c/s VSZ RSS > ... > > 1M 13M 121 MB 115 MB > > 10M 5.7M 1046 MB 1039 MB ... > probably not much parallelism for RAM accesses (it's more like > pipelining). This got me thinking: what stops me from similarly scheduling multiple RAM accesses from a single thread? So I tried the following sequence: 1. Extract hash2 for index+1. 2. Extract hash1 for index. 3. Do bitmap lookup 1 for hash1, but don't use the value yet. 4. Do bitmap lookup 2 for hash2, but don't use the value yet. 5. Use the result of bitmap lookup 1. 6. Use the result of bitmap lookup 2. This got me to 7.5M c/s for one thread (non-OpenMP build) at 10M hashes, as measured at the same time point as the 5.7M above. For a longer running session, this reduces to 6.6M c/s presumably as less similar passwords are tried (lookups against the charsets in inc.c throw more of the bitmap out of cache?) At 1M hashes, the speedup is very small (since the bitmap fits in L2 cache here). This is not committed yet - it's a quick hack for testing only. Somehow moving the hash2 extraction to between bitmap lookup 1 and the use of its result made the speed worse. My guess is that this has to do with the compiler having to save the register (result of bitmap lookup 1) on stack for the hash function call, where it effectively "uses" the lookup result (even if just for saving it for later use). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.