Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 12 Sep 2018 13:37:31 +0200
From: Matlink <matlink@...link.fr>
To: john-users@...ts.openwall.com
Subject: good program for sorting large wordlists



Le 11/09/2018 à 17:42, Solar Designer a écrit :
> Hi,
>
> On Tue, Sep 11, 2018 at 05:19:18PM +0200, JohnyKrekan wrote:
>> Hello, I would like to ask whether someone has experience with good tool to sort large text files with possibilities such as gnu sort. I am using it to sort wordlists but when I tried to sort 11 gb wordlist, it crashed while writing final output file after writing around 7 gb of data  and did not delete some temp files. When I was sorting smaller (2gb) wordlist it took me just about 15 minutes while this 11 gb took 4.5 hours (Intel core I 7 2.6ghz, 12 gb ram, ssd drives).
>
> As to sorting, recent GNU sort from the coreutils package works well.
> You'll want to use the "-S" option to let it use more RAM, and less
> temporary files, e.g. "-S 5G".  You can also use e.g. "--parallel=8".
>
> As to it running out of space for the temporary files, perhaps you have
> your /tmp on tmpfs, so in RAM+swap, and this might be too limiting.  If
> so, you may use the "-T" option, e.g. "-T /home/user/tmp", to let it use
> your SSDs instead.  Combine this with e.g. "-S 5G" to also use your RAM.
As Alexander said, you should use "--parallel" option for such big
files. And yes, you'll need temporary files and then a folder than can
handle huge files. I usually sort files of around dozens of gigas, and
it takes time but rarely more than 1 hour.

-- 
Matlink - Sysadmin matlink.fr
Sortez couverts, chiffrez vos mails : https://café-vie-privée.fr/
XMPP/Jabber : matlink@...link.fr
Clé publique PGP : 0x186BB3CA
Empreinte Off-the-record : 572174BF 6983EA74 91417CA7 705ED899 DE9D05B2


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.