john-dev - changes to --stdin (working with rules, etc)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <012701cc471d$7ac27610$70476230$@net>
Date: Wed, 20 Jul 2011 15:42:02 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: changes to --stdin  (working with rules, etc)

I have made a new command line switch.  I can calling it --pipe.  It is
similar to --stdin, but it is handled within the cracking loop in a much
different manner.

 

One of the things I have been most wanting out of the stdin processing, is
the ability to use --rules. The existing rules will NOT work with the stdin.
I made some flag changes, and found that it is due to how john processes
rules.  John runs a single rule against all words in the set, and then
rewinds the file, and runs the next rule against all words.  Well the works
fine in a file (but can cause slowdowns, as I will talk about later),
however, for stdin, there is no guarantee that fseek(0) will work.  It does
not on linux, or cygwin.  It does work on Win32 'native' builds, but I think
that is due to how redirection is 'faked' inside of windows cmd.exe.

 

Well, back to the slowdown in rules.  This was one of the big reasons I
added the wordfile_memory block, that reads the entire file into memory,
builds a set of pointers to the line starts, and then runs using this
preloaded memory.  This greatly improves throughput of rules (and even
'normal' processing) on some systems.   Since that code was added, I had the
idea that we could handle the stdin in this manner.  To simply load it, and
run.  

 

However, there were some serious issues with that design.  First, how to
handle 'real' stdin, where you can simply type in passwords.  That would not
work well at all.  The second, is if there is a process that builds words,
and can build more than the available memory. That is also not a good fit.

 

To work around these 2 issues, I did this.  First, I created a new --pipe
command line switch. It is 'like' the --stdin switch, but is mean to tell
john that this is a redirection or pipe, and not someone typing at the
stdin.  Thus the --stdin can have its functionality left intact and the user
can use that switch to type in passwords.  In this case, all we do is what
was done before, and that is to assign   word_file = stdin;  and allow the
run to process this one word at a time (also within options, --stdin and
--rules are not valid together).  Second, some code was added to the
wordlist_cracking loop, that when in --pipe mode, it will allocate the
needed buffers on the first loop through.  The buffer is allocated to be the
'max' size allowed in the options, and the number of lines is allocated to
be this max size / 13.  Having the 'guess' on line count work for lines
averaging 12 bytes, seems about right.  The code did work 'almost' from the
start, but I had to find a few bugs (things that could only happen one
time).    But I have gotten things to work.

 

$ cat ../run/password.lst | ../run/john -pipe -rules --stdout -mem=24000 |
grep words:

words: 141273  time: 0:00:00:00  w/s: 1296K  current: Halling

 

$ ../run/john -w=../run/password.lst -rules --stdout -mem=24000 | grep
words:

words: 141273  time: 0:00:00:00 DONE (Wed Jul 20 14:47:58 2011)  w/s: 1121K
current: Halling

 

$ cat ../run/password.lst | ../run/john -pipe -rules --stdout | grep words:

words: 141273  time: 0:00:00:00  w/s: 1552K  current: Halling

 

$ cat ../run/password.lst | ../run/john -stdin -rules --stdout

Invalid options combination or duplicate option: "-rules"

 

The first test made a very small memory buffer, which caused it to be loaded
numerous times. NOTE, the code as written will stop 'filling', when it is
within LINEBUF_LEN (16k) from the end, so at 24k, the buffer is actually
only 8k.  The 2nd test is the 'normal' test, using the --wordlist= option.
The 3rd test is the pipe, but a single read.  The 4th test is the stdin,
which fails in the options processing.

 

Now, the actual 'order' of the words produced when the entire input file
cannot fit into memory at once, will not be the same.  It will process the
same as if it was run on multiple parts of the file, running them through
the rules on each group independently.  But I have sorted and uniq'd the
data from a -pipe -rules  and a -w= -rules runs, and the actual words built
were 100% the same.

 

I have not placed these online just yet.  I want to run some additional
tests, to make sure things are working right.

 

Jim.


Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.