Date: Mon, 25 Jan 2021 19:54:33 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: source of information for John's charset files On Mon, Jan 25, 2021 at 07:35:48PM +0100, Johny Krekan wrote: > I understand right that the list rockyou which you used had duplicate > words for example 2x word example in it. My question is what is the > reason or advantage of using such wordlist with duplicates in comparison > with wordlist with no duplicates? If I create one .pot file from the > rockyou with no duplicates would it provide worse probability in finding > the password during same time as yours? A reasonable expectation is that inclusion of duplicates in the training set increases the number of cracked accounts rather than cracked unique passwords in subsequent password security audits. Conversely, omitting the duplicates would possibly optimize for cracking more unique passwords but perhaps fewer accounts. An alternative hypothesis is that inclusion of duplicates might also help crack more unique passwords that are based on frequent substrings even if those came from fully duplicate passwords, since otherwise those substrings would be under-represented in the training set. You or/and others are welcome to research whether these hypotheses are true or not. I no longer recall the results of my own testing from back when I made this choice (IIRC, in 2013). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.