Date: Tue, 4 Sep 2012 05:49:00 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: memory usage (was: [john-users] JTR against 135 millions MD5 hashes) Jim - Thank you for sharing your thoughts on this. On Mon, Sep 03, 2012 at 11:16:44AM -0400, jfoug@....net wrote: > The memory reduction due to the source() function, and also due to removing some wasteful allocations within the prepare() and valid() within dynamic. I believe the wasteful allocations in dynamic have been removed, all the way back to the J7-RC. The memory requirements were greatly reduced early on, but have slowly started to be lost some (the bitmaps). However, total mem usage is still much less than it was, when we were keeping the full source buffer for each hash. I did not realize that there were wasteful allocations in prepare() and valid(). Weren't they temporary? Another potential source of memory usage reduction are the alignments. For raw MD5 hashes, a 4-byte alignment should suffice (they're 4x4 bytes), yet we were using 8-byte alignment on 64-bit builds. I've just checked: bleeding's rawMD5_fmt_plug.c is still using DEFAULT_ALIGN. I think this should be corrected to be just "4". We should review other formats for excessive alignment like this as well. As to the bitmaps, they use less memory than hash tables used before. For the same size in terms of number of entries, bitmaps are obviously size-of-pointer-in-bits times smaller than hash tables. Yes, we also use hash tables along with bitmaps, but they're smaller by PASSWORD_HASH_SHR (right shift by 2, as specified in the current params.h). So overall we need almost 4 times less memory for the same number of entries, due to the introduction of bitmaps. However, at the same time the thresholds have been changed to be allocating bitmaps and hash tables for a larger number of entries. This is what may be increasing memory usage. For hash counts as large as 146 million (from my test), this shouldn't matter, though - since the largest size should have been picked anyway (and it corresponds to almost 4 times smaller memory usage now). Also, for saltless hashes we only need one bitmap and one hash table. That's for the entire john process. When its total allocation size is multi-gigabyte, spending 272 GiB (if I calculated correctly) on a bitmap and a hash table combined is not a problem. (In fact, I think we should introduce even larger bitmaps and hash tables to better handle cases like that - this is on my to-do.) Before the introduction of bitmaps, we were spending 1 GiB. > There are still other areas where significant memory reduction can be made: > > When we do use source(), we currently have to keep the full binary in memory, which is a waste, it is ONLY needed to regenerate source to print it back out. In 'non' source() mode, we only need to keep 4 bytes in hot memory, for a binary check. I think we could reduced this from 16 bytes per hash (or 64 bytes for sha512!!!), to only 8 bytes per hash, with an auxiliary file storing the full array of binary hash data points. Since the binary data is all fixed size, JtR only needs to use an index, vs a real offset pointer, computing the true offset into the file at runtime.. This should allow JtR to only need to have 4 bytes for the pointer into this file. That would allow running against 4 billion hashes at once, which should be an 'adequate' limitation. So this would drop us from 16 bytes to 8 bytes per hash. Probably make up for the memory lost to the bitmaps (and quite a bit more). Run time should not be impacted much, except VERY early on, when a huge number of hashes are being found. Once you get past that early onslaught, having to lookup data in an external file should not impact runtime at all. That file would ONLY be needed to be accessed when printing a found factor (or in some of the -show functions). Yeah, but I think hot/cold separation inside the process' virtual memory will be better than using external files. If a person wants swapping out to disk, they will probably have swap space configured in the OS - so we just need to be friendly to that. If a person has a lot of RAM and no other tasks competing for the RAM, then the swapping will likely not happen even for the "cold" portions. I think that us supporting essentially a swap file of our own would only be beneficial in some relatively obscure cases. Yes, using 32-bit offsets instead of full 64-bit pointers is a way to save memory. It is usable with inside-address-space cold storage as well; those don't have to be _file_ offsets. In fact, there may be semi-hot items that we may reasonably access via similar 32-bit or smaller offsets as well, with the same goal (saving memory in 64-bit builds). unique.c uses 32-bit offsets in this fashion. single.c uses 16-bit offsets for buffered candidate passwords (since there's a separate buffer for each salt, the savings are significant when the salt count is large). We might identify other places where this is beneficial and not too cumbersome. > Second, IIRC JtR uses way too much memory when the input file is largely cracked. Yes. > If the input file has 140 million, but 115 million of them are cracked, there is a lot of memory used loading the hashes, then loading the .pot file hashes, then things get marked off, and a new list of the remander candidates is created. No, that's not true. The memory usage remains the same as if no passwords were previously cracked. > But IIRC, this memory is not released (or not fully released). Yes. The memory usage stays the same as if no passwords were removed. > It would be nice if JtR would be able to load 140m into the same machine, no matter if there was a .pot file with 115m or not. This is already true. > Also, it would be nice if JtR had the same final runtime foot print no matter if you run against 25m hashes, or an input file with 140m where 115m were already found. Yes, but this is tricky and it is a tradeoff. > For the binary in a separate file, the biggest issue, is that separate file, making sure it gets cleaned up, and what to do upon restart, if one already exists. Leaving 2gb or larger files scattered on a users system is not a really good thing to do. However, I think changing having the full binary in memory, to having most of the binary live in an external file, should reduce memory about 30-40% beyond what it has been reduced already. Compared to the swap-friendliness (hot/cold separation inside the address space), using an external file would only have cosmetic benefits (except under some unusual circumstances e.g. when the OS doesn't have swap configured), and not to everyone's taste. Simply being swap-friendly also avoids the problem of cleaning up. That said, on Unix-like systems (and appropriate filesystems), cleaning up is easy: we keep fd to the opened file, but unlike the file right after we've created it. Then we just use the already-unlinked file. When the process terminates, the only instance of the fd to the file is automatically closed and the filesystem blocks are automatically freed. It may be a bit nasty if the system crashes with the unlinked file still opened, though. The blocks would then be freed on next bootup, which is generally handled just fine - but a warning may be printed. Another nastiness is in user confusion ("df" displays less free space than the person expects from listing the directories). Overall, I am for swap-friendliness rather than explicit swap files. > For trying to use same memory for 25m vs 140m with 115m in the .pot file, likely it would require something like offloading things to disk, being able to FULLY scrub memory, then reloading properly. There may be many other ways to do this, but often when there is this many allocations, that is about the only way to really free up the temp used pages, and reclaim memory. We could similarly do this inside the process' address space, causing some swapping on low RAM systems. We'd rely on libc's malloc/free being able to release a large contiguous region of memory back to the OS, though - which won't always be the case (especially not if we allocated that region in small chunks rather than all at once). We could use mmap() to make this explicit, but that's less portable. Another approach would be to use a memmove-like approach to compact the data in-place (adjusting the pointers indeed). That is, rather than just exclude a removed entry from linked lists, hash tables, etc., move another entry (from the end of the allocated region) into its storage and re-point pointers for that other entry. I'd just ignore this problem/wish for now. Solving this might not be worth it. This involves tradeoffs. > For loading 140m vs 140m with 115m in .pot file loading on same machine (i.e. using same memory), we may have to do something along the lines of how the unique is done, vs loading the whole world into hot memory. This 'unique' like reduction may also help with the problem talked about in the prior paragraph. This is what we're doing already. > For a short cracking session (or no cracking session, such as -show), load time is a very large part of overall time. BTW, I think we need to replace the hard-coded PASSWORD_HASH_SIZE_FOR_LDR with a range of settings and choose a likely optimal one based on file size (have some thresholds). Things become trickier when multiple files are given on the command-line, though (we'd have to loop over and stat() the filenames prior to loading to determine the total size). This is on my to-do, but it is not a priority. > Again, the ideas are mind food. I have not put any time into code. But it is similar to when I was looking at doing source(). I KNEW JtR was using way more memory than was needed. It is all in finding the largest of the low hanging fruit, and picking it, and doing it in a way that has the minimal adverse affect runtime speed. Whatever we do on this, if anything, should not introduce much complexity. I am concerned that some of these ideas do involve a substantial complexity increase. Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.