Date: Fri, 20 Feb 2009 00:55:20 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: big wordlists On Thu, Feb 19, 2009 at 05:33:02PM +0000, kalgecin@...il.com wrote: > i've recently generated a 3gig wordlist using john's rules. but john > couldn't open the wordlist. must i split the file or is there any > other way? If you feel you really want to use a wordlist this large, then currently the best way to do it would be to use a 64-bit build of JtR. It may be faster as well. Of course, your hardware and OS must be 64-bit capable. 32-bit builds of JtR are currently limited to wordlists of up to 2 GB. On Thu, Feb 19, 2009 at 11:05:40AM -0700, RB wrote: > Not addressing the size, but the methodology: why? This is precisely > why wordlists are implemented the way they are in JtR - you have a > basic dictionary, then apply mutators on the fly. The CPU cycles > spent performing the mutations are miniscule in comparison to the rest > of the processing, and [likely] orders of magnitude faster than > waiting on disk I/O for such large lists. > > If you need the generated list for other purposes (like feeding to > another process) that's one thing, but otherwise you're going to > generally be better off letting JtR do what it's made to do. That's right, but there is in fact a reason to have JtR pre-apply its mangling rules to a wordlist, yet use the resulting bigger wordlist with JtR itself: Although the mangling rules (in the default ruleset) are designed to avoid producing duplicate candidate passwords, they nevertheless happen to produce some with most large wordlists. This is because it'd take a lot of processing (and more memory) to completely avoid duplicates, for all possible input wordlists. For example, if the input wordlist has both "word" and "Word", and the ruleset has the following rules (which are part of the default ruleset): [List.Rules:Wordlist] # Try words as they are : # Lowercase every pure alphanumeric word -c >3!?XlQ # Capitalize every pure alphanumeric word -c >2(?a!?XcQ then JtR will try each of "word" and "Word" twice. Of course, converting the input wordlist to all-lowercase, as suggested in doc/MODES, avoids this specific problem, but it may also lose valuable information (e.g., on entries such as "O'Brien"), and there are more subtle cases where duplicates may be produced by some combinations of word mangling rules in the default ruleset and some wordlist entries. Thus, when cracking slow hashes and/or when the number of different salts is large, it makes sense to pre-mangle the wordlist and pass the resulting stream of candidate passwords through "unique" - the program included with JtR (and in fact linked into the JtR program binary, making "unique" merely a symlink to "john"). Due to the way "unique" works, the output has to be saved to a file. Then the resulting file, with no duplicates in it, may be used for the slow cracking. An example on how to invoke "john" and "unique" in this way is given in doc/EXAMPLES. Alexander -- To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply to the automated confirmation request that will be sent to you.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.