Date: Sun, 27 Jan 2008 08:25:20 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: wordlist compression and runtime decompression (was: wordlist mode improvement maybe) Replying to both helleye and Marti at once: On Sat, Jan 26, 2008 at 08:41:55PM +0000, helleye wrote: > so when we read the new wordlist , we will need to read 8 chars total, > updating just the new chars > (not counting crlf counters and spaces , but i think it will worth it on real > sorted wordlists) ,instead of 16 You have almost re-invented Crack 5's DAWG (Directed Acyclic Word Graphs) compression. It's a good thing, but it's not important to have, and it will confuse some users. Thus, it stays on my to-do list for JtR as a low priority item for years. Now that there's JtR Pro, which includes a large wordlist pre-packaged along with the program, I am actually considering implementing this as it would reduce the download size for the package (currently at 13 MB). > you think it will improve speed ? Not with Unix password hashes. It might improve speed in special cases, such as when running a huge number of wordlist rules against very fast saltless hashes (or against a single fast hash, in which case it may as well be salted). However, the primary purpose would be to reduce the size of downloadable packages, to save hard drive space when the packages are installed, and maybe to save RAM that the OS uses for caching the wordlist (see below). On Sun, Jan 27, 2008 at 05:23:56AM +0200, Marti Raudsepp wrote: > ... cracking speed is usually CPU-bound (not I/O bound). This is correct - except in special cases I've mentioned above. In fact, even in those special cases, a decent OS will cache the entire wordlist in RAM (as long as there's enough RAM that would otherwise stay unused), so reading the wordlist is CPU-bound and memory-bound, too. One annoyance here is the access time updates on the file, unless disabled with the "noatime" mount option or equivalent. Maybe a future version of JtR should optionally read the entire wordlist into memory on its own to avoid this OS overhead. Then DAWG compression would make more sense (can store the wordlist in memory as DAWG-compressed). > Doing runtime decompression would increase CPU usage rather > than reduce it. This is not certain. The OS kernel and C library (stdio) overhead involved in reading data from a (cached) file is comparable to or even bigger than the computational cost of DAWG decompression, which is almost negligible. > However if there really is a good reason, you could pull this off > using widely available compression tools like gzip/bzip2/lzma by > piping their output to john's stdin. They would probably also have > much better compression ratios (>5 times is normal for text files). Of course, but: 1. These will in fact increase CPU usage, although this is fine if the data is actually being read off disk platters (not yet cached) and/or a CPU core would otherwise stay idle. 2. It has been shown that the combination of DAWG and gzip provides much better overall compression for sorted wordlists than gzip alone does. 3. "john --stdin" is incompatible with wordlist rules, precisely because JtR does not load entire wordlists into memory. 4. Built-in support for "serious" compression in JtR would introduce dependencies on external libraries and/or result in code bloat (if such support is fully implemented in JtR's own source code). That said, optional dependencies on external libraries (and thus optional support for the corresponding compression methods) may be OK. Given the above, you can see that DAWG is a more likely candidate for inclusion into JtR itself. In fact, it's not really an alternative to the "serious" compression methods since both can co-exist nicely. -- Alexander Peslyak <solar at openwall.com> GPG key ID: 5B341F15 fp: B3FB 63F4 D7A3 BCCC 6F6E FC55 A2FC 027C 5B34 1F15 http://www.openwall.com - bringing security into open computing environments -- To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply to the automated confirmation request that will be sent to you.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.