john-dev - Re: problem with disc space for shared files in MJohn

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120717114713.GA8658@debian>
Date: Tue, 17 Jul 2012 15:47:13 +0400
From: Aleksey Cherepanov <aleksey.4erepanov@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: problem with disc space for shared files in MJohn

On Tue, Jul 17, 2012 at 10:28:36AM +0200, Frank Dittrich wrote:
> On 07/15/2012 01:24 PM, Aleksey Cherepanov wrote:
> > I heard that some users have about 40gb of wordlists individually.
> > Currently it would be a problem if MJohn would copy all files to the
> > server.
> 
> We might even need some precautions against using dictionaries that
> differ only in the sequence of words.
> 
> Imagine someone got rockyou.txt in the original sequence (sorted by
> descending frequency), and someone else sorted the file alphabetically.
> (There might even be different sort sequences, depending on locale
> settings.)
> 
> It is obvious that running the same kind of attacks using both of these
> files is pointless.
> 
> We can't just treat both versions of the file as the same file.
> Otherwise, and interrupted session cannot be restored on another client.
> 
> Furthermore, the file sorted by frequency usually is the preferred one.
> (Just in case later on we just want to try more complex rules on the top
> 1000 passwords of this wordlist...)
> 
> Similar issues could exist with two files that only differ in line
> endings (<LF> vs. <CR><LF>).

We could normalize all files before sharing, i.e. s/\r?\n/\n/ .

> Even files which just use different encodings should be taken care of,
> if the majority of words contain just ASCII characters, and only very
> few contain non-ASCII characters.
> 
> Do we need to implement some checks for newly added files, and issue a
> warning whenever a new file is added which has the same size and/or same
> number of lines as an already existing file?

What if someone add one word to the wordlist? (In this case for
wordlist attack we need to take only one additional word, though if we
care about more complex attacks like mixing words into passphrases
then it is much harder to extract the complement.)

For the first time I think it is not necessary. There are some easy
checks (number of lines) to help search similar files

I think it would be hard to handle all cases and I doubt it is worth
it. But we could add something more general: something like estimation
of size of diff (though it does not catch encodings when many lines
are different by encoding but are the same by meaning).

> (OTOH, the same problem can exist with rule sections only differing in
> the sequence of rules, or rules sections with many overlapping rules.)

I think at this time it should be done manually. We should compensate
this in other places: for instance with good dispatching we could
assume that experienced member would not start attack with
alphabetically sorted rockyou while attack from inexperienced member
would not take much cpu time.

Thanks!

-- 
Regards,
Aleksey Cherepanov

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.