Date: Tue, 15 Sep 2015 06:40:39 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Judy array Fred, magnum - On Sun, Sep 13, 2015 at 08:28:08PM -0700, Fred Wang wrote: > I use a 10-year-old Dell 2950 as my test environment, precisely because it uses slower memory, and more easily shows improvements. For my "standard" test case (MD5, 29 million hashes, a ~13 million entry dictionary, and best64 rules, yielding about 1 billion hash attempts to find about 1.7 million solutions) > > hashcat 3 minute 54 seconds > mdxfind 1 minute 15 seconds (Judy only) > mdxfind 47 seconds (Current code, Bloom filter + Judy) With the attached patch, and running this command line: time ./john -form=raw-md5 -w=10m.pass -ru=best64 -nolog -mem=999999999 -v=1 -fork=8 29m.txt I am getting: real 1m17.021s user 6m46.399s sys 0m20.028s on 2x E5420 with 16 GB RAM. The 8 processes have about 2 GB allocated each, which initially means a little over 2 GB of real RAM for all of them, but as passwords get cracked and pages get copied, the total memory usage grows to slightly over 8 GB, unfortunately. The patch reduces the copy-on-write occurrences; without it, memory usage would be higher yet. Of course, this is still not great. The patch shows my changes to john.conf (these are not to be committed). These were most important: -Save = 60 +Save = 600 -ReloadAtCrack = Y +ReloadAtCrack = N -ReloadAtDone = Y +ReloadAtDone = N -ReloadAtSave = Y +ReloadAtSave = N Effectively disabling the pot sync feature as above saves several minutes. This saves 4 seconds, but only when -mem=999999999 is also used: -WordlistMemoryMap = Y +WordlistMemoryMap = N This saves 1 second: -NoLoaderDupeCheck = N +NoLoaderDupeCheck = Y This makes almost no difference (the system is otherwise idle): -Idle = Y +Idle = N I think there might still be wordlist duplicate suppression going on. It would be nice to try disabling it. In cracker.c, only the copy-on-write reducing changes actually help in this benchmark. I am not entirely confident that they don't break anything in any other cracking mode, etc. - I'd appreciate some testing of them in jumbo before I possibly get them into the core tree. Prefetching doesn't help in this benchmark. In fact, without it I was getting 1 second lower running time: real 1m16.055s user 6m45.974s sys 0m20.433s That's two separate hunks in the patch (one with "#include <emmintrin.h>" and the other with actual code), so perhaps they should be excluded for now. The change to PASSWORD_HASH_SIZE_FOR_LDR speeds up startup by a few seconds. The table size used is 16M elements, so 128 MB on 64-bit or 64 MB on 32-bit systems. It think that's acceptable these days, especially given that for tiny files (which is what people might process on tiny systems) that's just address space rather than memory allocation. This memory is freed after loading is complete. The change to POT_BUFFER_SIZE saves up to 1 second in this benchmark, but it was helping a lot more before I disabled pot sync (it was still way too slow). The change to LOG_BUFFER_SIZE doesn't matter for this benchmark since I used the -nolog option. The addition of source() method for raw-md5 helps a lot. Without it, and without the copy-on-write avoidance in cracker.c, I couldn't run 8 processes on this machine without it getting into swap. Perhaps we should add source() to more formats. Alexander View attachment "john-huge-opt1.diff" of type "text/plain" (8318 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.