Date: Mon, 28 Feb 2011 11:13:02 +0100 From: bartavelle <bartavelle@...il.com> To: john-users@...ts.openwall.com Subject: Re: GSoC 2011 On 28/02/2011 06:56, Solar Designer wrote: > I'd appreciate any comments and suggestions. "Further research on and implementation of automatic rule set generation based on previously-cracked passwords" I am spending my spare time on such a project. My approach is to start from a dictionnary and mutate it with as many rules as possible. I am only using rules that do not add prefixes or suffixes. Right now I am working with a simple rule list, but if this proves workable I will use other approaches (exhaustive search or genetic algorithm optimization come to my mind). For each mutated dictionnary, I try to match them in a real password list, such as the rockyou list, and I store what comes before and after my mutated word in the actual password. I can then compute which passwords can be found with a given (mutation, prefix, suffix) tuple. I have done this part, and will implement coverage searching real soon, so that I could select a "good" 100 rules set (I'll go for a greedy algorithm, so it will be a heuristic). My hypothesis is that the most common password selection technique, for hard passwords, is to take a known word, mutate it, and add stuff before and after. There are two challenges that must be overcome : * the matching part, to be efficient, requires a lot of memory if you want to work with a big enough dictionnary. Same for the coverage resolution. My implementation requires around 8G RAM and 60s per rule for a 21965166 words dictionnary and 6025204 passwords, on a i7 860. This means I cannot run several instances in parallel without swapping as I only have 16G RAM. If somebody could point me on a time and memory efficient multistring matching algorithm, I'm interested. * I'm hacking stuff as it goes, and this produces unreliable programs. Moreover, I don't know how to test properly if a program works, as there is no reference that I know of. Given the memory requirement of the C program, I can't even prototype it in a "simpler" language. There are probably tons of other approaches to this problem, such as that described by Matt Weir in his thesis. There is however a distinctive lack of interest in them, probably due to the fact that it is much harder than coding yet another md5 gpu bruteforcer, and that it requires a lot of material, trial and error. If somebody else has ideas on that topic, I would be really interested. A better approach would be to start from the dictionnary and password list, and, for each password, efficiently compute rules that could transform some of the dictionnary words into it. I have no idea on how to do this however. BTW, I already computed a large prefix/suffix list from rockyou (and other sources) and a wikipedia based dictionnary. It works quite well and starts with : : $1 Al"123" $2 $a $3 Al"12" $7 Al"23" $s $5 Al"13" $4 It works quite well and you can find prefixes such as "ilove", "lil" and suffixes such as "boy", "man", "love" that crack a lot of passwords. I also wrote a quick hack that helps computing that kind of huge dictionnaries that works like "sort -u" except it doesn't sort, and is thus MUCH faster. That kind of tools, optimized for large data sets, would be incredibly helpful as contributed tools for JtR.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.