john-users - Re: how to perform dictionary "subtraction"

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20200212235301.GA8414@openwall.com>
Date: Thu, 13 Feb 2020 00:53:01 +0100
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: how to perform dictionary "subtraction"

On Wed, Feb 12, 2020 at 01:28:15PM +0100, Matus UHLAR - fantomas wrote:
> On 12.02.20 10:27, Johny Krekan wrote:
> >Imagine you have two sorted wordlists:
> >one which is smaller and contains for example 5 words for example:
> >address
> >book
> >is
> >important
> >useful
> >and one larger which contains 10 words. Problem is: how Could I 
> >filther this larger wordlist to  create 3th
> >wordlist which does not contain any word from the smaller one?
> 
> you can try either "comm -23 wordlist-big wordlist-small" 
> or "join -v1  wordlist-big wordlist-small" 
> 
> ...provided you use unix-like system.

Please note that both of these commands do in fact assume that the input
wordlists are sorted.  In fact, they assume the wordlists had been
sorted with the same locale settings that are in effect at the time of
running these commands.  Any discrepancy will yield wrong results.

> however this is not question related to JTR...

JtR includes its own tool called "unique", which in jumbo has these
options:

-ex_file=FILE      the data from FILE is also used to unique the output, but
                   nothing is ever written to FILE
-ex_file_only=FILE assumes the input is already unique, and only checks
                   against FILE (again the latter is not written to)

So it can be used like this:

./unique wordlist-out -ex_file_only=wordlist-small < wordlist-big

This does not require the wordlists to be sorted.

For speed, also the "-buf" option can be used, e.g. "-buf=25" to use
25 GB of RAM (only if you have that much free RAM and the wordlists are
large enough to benefit from such RAM usage).

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.