Date: Fri, 25 May 2012 00:26:24 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: memory usage within JtR and possible ways to significantly reduce it. On Thu, May 24, 2012 at 02:33:26PM -0500, jfoug wrote: > The hot/cold does makes a lot of sense. Very good way to try to keep the > locality. For a 10 million candidate search, you still have 40mb of hot > memory and 120mb of 'cold' memory, vs 160mb of arbitrary accessed memory. I > am not sure that reducing from 160mb to 40mb will have that 'huge' of a > help, but it might. Yes. And Frank is correct: this is mostly for sizes that fit or almost fit into some sort of cache. I imagine that 40 MB will fit into a L3 cache in a few years from now, though. A non-negligible portion of it may already fit, resulting in some speedup now. In practice, the bitmaps will be the hottest portion of the working set, though - but when there's a bitmap and hash table hit, then the hashes themselves also come into play. > But I certainly understand the goal, and it certainly should not hurt > overall memory usage. Actually, if we split binary into hot/cold portions, then we'll need an extra pointer per hash. (Well, unless there's some obvious way to derive the address of the cold portion from the address of the hot portion, but that's tricky.) For the current binary/source setup, you're right: moving source to cold doesn't cost us extra memory overall, except maybe for a small and fixed amount to maintain the second pool. In fact, it might save memory on alignment gaps. We could save memory by eliminating the source pointer and having it calculated as (char *)binary + binary_size, though. The hot/cold thing prevents us from doing that. > Did you have any problems with starting on the source() (or rebuild_hash() > or whatever), within the current jumbo john (prior to 1.8), just to start > working through any unforeseen issues? If you have no problem, then what > interface would you like to use, so that I could start on that, and have > built using the interface you would like. My problem is that it's not one of my priorities now. I can't even afford to spend time on discussing this further now. Overall, I thought the interface would be very similar to what you proposed. > I am not looking at starting to split the binary (just yet), but am looking > at starting on the 'optional' source() method, to eliminate having to have > the hashes allocated (IF a format can recreate the hash, some may not be > able to do that)? Please feel free to experiment with that if you like. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.