Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 17 Aug 2012 10:04:35 -0600
From: Kevin Young <>
Subject: Passphrase Creation

Hello everyone,

First off, thanks to Matt, Solar Designer, and the other John-users for
inviting me to participate in the CMIYC contest. I learned a lot and had a
great time.

I've been using passphrases for several months now and have seen some
chatter on the subject so I thought I'd chip in. Most of my phrase creation
is contained in a bash shell script. But I'm sure there's someone out there
with a much better tool, method, or way to do this.

Step 1. Find a good source of words
As mentioned in other posts, the Gutenberg project is a good source. I've
also tried mining the Library of Congress, and a few others.

Step 2. Store and organize
Storage proved an early challenge as I underestimated the space
requirements. The 15,000 raw (unprocessed) books I currently have fill a
300GB drive. It doesn't sound like much, but things grow quickly. A SSD
helps as disk I/O becomes a bottleneck.

Step 3. Download your material
I use a simple wget loop here. Don't saturate the bandwidth of your source
or you'll get booted.

Step 4. Scrub raw input
Strip special characters and punctuation. Convert to lowercase and remove
excess space characters (sed and awk). Convert between file formats if
necessary (dos2unix, unix2dos, or unix2mac). Using these commands I create
a single long "sentence".

It was a dark and stormy night. All the animals were asleep.
Somewhere overhead a flash of lightning illuminated the canyon walls
followed by the thunder's rumble.

it was a dark and stormy night all the animals were asleep somewhere
overhead a flash of lightning illuminated the canyon walls followed by the
thunders rumble

Step 5. Phrase length and create phrases
I've tried phrase lengths from 3-10 words. Using the above example, a
5-word length, and custom app (arrays and recursion are your friend here)
phrase creation begins:

it was
it was a
it was a dark
it was a dark and
was a
was a dark
was a dark and
was a dark and stormy
a dark
a dark and
a dark and stormy
a dark and stormy night
dark and
dark and stormy
dark and stormy night
dark and stormy night all
and stormy
and stormy night
and stormy night all
and stormy night all the

I also create a no-space version at the same time. (Is there a mangling
rule that can handle this?)


Step 6. Optimize and reduce
As expected there are lot of duplicates so my script performs a dictionary
sort and filters out the duplicates (sort and uniq). I also filter out
(grep) things like open source verbiage, distribution notices, credits, etc.

Step 7. You're done
I typically get 1-5 million phrases per book. It isn't optimal but the
combinations are vast. (See sample phrases submitted for CMIYC 2012.) I've
plucked thousands of similar phrases from LinkedIn and Stratfor  -- some
were as long as 28 characters. = : )

So there it is...I'm sure there are better ways to do this and I clearly
have a lot to learn. (Perhaps mangling rules can solve many of the above
mentioned hurdles?) I still have a LOT of things to do to improve the
process but I'll save those tricks for CMIYC 2013 ;)

Thanks go to Matt Weir for his willingness to share a password dialog. I
also throw a shout to @joshdustin ( ) for his insight,
assistance, and suggestions -- the guy is a linux wizard, white-hat genius,
and great friend.

If anyone has suggestions for improvement or questions look me up.

Best of luck,


CMIYC 2012 sample:
He pondered a moment
rummaged in his pack
She was ashamed to
shorter space of time
to look at some
treatment of the slaves
I must be aware
you and your master
back of his head
panel in the wall
to his aid
more capable of giving
fathers shall eat
establishment of so many
have been here before
There are a few
a thousand years ago
then he was thinking
shall they utter
been able to find

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.