Date: Tue, 17 Jul 2012 15:47:13 +0400 From: Aleksey Cherepanov <aleksey.4erepanov@...il.com> To: john-dev@...ts.openwall.com Subject: Re: problem with disc space for shared files in MJohn On Tue, Jul 17, 2012 at 10:28:36AM +0200, Frank Dittrich wrote: > On 07/15/2012 01:24 PM, Aleksey Cherepanov wrote: > > I heard that some users have about 40gb of wordlists individually. > > Currently it would be a problem if MJohn would copy all files to the > > server. > > We might even need some precautions against using dictionaries that > differ only in the sequence of words. > > Imagine someone got rockyou.txt in the original sequence (sorted by > descending frequency), and someone else sorted the file alphabetically. > (There might even be different sort sequences, depending on locale > settings.) > > It is obvious that running the same kind of attacks using both of these > files is pointless. > > We can't just treat both versions of the file as the same file. > Otherwise, and interrupted session cannot be restored on another client. > > Furthermore, the file sorted by frequency usually is the preferred one. > (Just in case later on we just want to try more complex rules on the top > 1000 passwords of this wordlist...) > > Similar issues could exist with two files that only differ in line > endings (<LF> vs. <CR><LF>). We could normalize all files before sharing, i.e. s/\r?\n/\n/ . > Even files which just use different encodings should be taken care of, > if the majority of words contain just ASCII characters, and only very > few contain non-ASCII characters. > > Do we need to implement some checks for newly added files, and issue a > warning whenever a new file is added which has the same size and/or same > number of lines as an already existing file? What if someone add one word to the wordlist? (In this case for wordlist attack we need to take only one additional word, though if we care about more complex attacks like mixing words into passphrases then it is much harder to extract the complement.) For the first time I think it is not necessary. There are some easy checks (number of lines) to help search similar files I think it would be hard to handle all cases and I doubt it is worth it. But we could add something more general: something like estimation of size of diff (though it does not catch encodings when many lines are different by encoding but are the same by meaning). > (OTOH, the same problem can exist with rule sections only differing in > the sequence of rules, or rules sections with many overlapping rules.) I think at this time it should be done manually. We should compensate this in other places: for instance with good dispatching we could assume that experienced member would not start attack with alphabetically sorted rockyou while attack from inexperienced member would not take much cpu time. Thanks! -- Regards, Aleksey Cherepanov
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.