Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 28 Feb 2011 11:13:02 +0100
From: bartavelle <>
Subject: Re: GSoC 2011

On 28/02/2011 06:56, Solar Designer wrote:
> I'd appreciate any comments and suggestions.

"Further research on and implementation of automatic rule set generation
based on previously-cracked passwords"

I am spending my spare time on such a project. My approach is to start
from a dictionnary and mutate it with as many rules as possible. I am
only using rules that do not add prefixes or suffixes. Right now I am
working with a simple rule list, but if this proves workable I will use
other approaches (exhaustive search or genetic algorithm optimization
come to my mind).

For each mutated dictionnary, I try to match them in a real password
list, such as the rockyou list, and I store what comes before and after
my mutated word in the actual password. I can then compute which
passwords can be found with a given (mutation, prefix, suffix) tuple.

I have done this part, and will implement coverage searching real soon,
so that I could select a "good" 100 rules set (I'll go for a greedy
algorithm, so it will be a heuristic).

My hypothesis is that the most common password selection technique, for
hard passwords, is to take a known word, mutate it, and add stuff before
and after.

There are two challenges that must be overcome :
* the matching part, to be efficient, requires a lot of memory if you
want to work with a big enough dictionnary. Same for the coverage
resolution. My implementation requires around 8G RAM and 60s per rule
for a 21965166 words dictionnary and 6025204 passwords, on a i7 860.
This means I cannot run several instances in parallel without swapping
as I only have 16G RAM. If somebody could point me on a time and memory
efficient multistring matching algorithm, I'm interested.
* I'm hacking stuff as it goes, and this produces unreliable programs.
Moreover, I don't know how to test properly if a program works, as there
is no reference that I know of. Given the memory requirement of the C
program, I can't even prototype it in a "simpler" language.

There are probably tons of other approaches to this problem, such as
that described by Matt Weir in his thesis. There is however a
distinctive lack of interest in them, probably due to the fact that it
is much harder than coding yet another md5 gpu bruteforcer, and that it
requires a lot of material, trial and error.

If somebody else has ideas on that topic, I would be really interested.
A better approach would be to start from the dictionnary and password
list, and, for each password, efficiently compute rules that could
transform some of the dictionnary words into it. I have no idea on how
to do this however.

BTW, I already computed a large prefix/suffix list from rockyou (and
other sources) and a wikipedia based dictionnary. It works quite well
and starts with :


It works quite well and you can find prefixes such as "ilove", "lil" and
suffixes such as "boy", "man", "love" that crack a lot of passwords.

I also wrote a quick hack that helps computing that kind of huge
dictionnaries that works like "sort -u" except it doesn't sort, and is
thus MUCH faster. That kind of tools, optimized for large data sets,
would be incredibly helpful as contributed tools for JtR.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.