Date: Thu, 15 Feb 2018 15:10:28 +0100 From: Matlink <matlink@...link.fr> To: john-users@...ts.openwall.com Subject: Re: Markov Sampling I am interested by knowing how you calculate the level required to generate plaintext passwords (as you explain in 1)). Le 13/02/2018 à 19:16, Matt Weir a écrit : > So my initial reaction was that I'm doubtful printing out 1/10 of all > the password guesses will give you the end results you are looking > for. Here are a couple of other options: > > 1) You can always work backwards if your target plaintexts are known. > Aka calculate the level required to generate them, (that's easy to do. > Sum up all of the transitions based on your character file). Once you > have that, calculate how many guesses are required to get to that > level which the included tools in JtR will do for you. If you need > additional help on this approach I can write up something more > detailed. This is *super* fast and gives you a pretty good > approximation. > > 2) You can rely on JtR's own logging to see when passwords would be > cracked so that way you are not slowed down by piping the output to > stdout or using a 3rd party tool to figure out how many passwords you > cracked. If you want to go this route I'm sure the people on this list > can help out. > > There's other options as well I guess. You could always modify the > Markov code directly in JtR. If you want other examples of the Markov > code, I re-implemented it in Python for my PCFG cracker. You can see > the guess generation code at: > > https://github.com/lakiw/pcfg_cracker/blob/master/python_pcfg_cracker/pcfg_manager/markov_cracker.py > > Of course, my code is slow so that almost certainly is not the way to > go, but it may be useful as a reference. Long story short, if you let > us know more what you are trying to do I'm sure we can brainstorm some > better options. > > Matt > > > > On Tue, Feb 13, 2018 at 11:47 AM, Solar Designer <solar@...nwall.com> wrote: >> On Tue, Feb 13, 2018 at 04:44:03PM +0100, Matlink wrote: >>>> The pre-defined --external=Parallel mode will do what you ask for. >>>> You'll just need to customize the "node" and "total" numbers in its >>>> init() in john.conf. >>> Well, I guess it's only 'not printing' generated candidates? Does it >>> really speed up the process, since generating a password candidate is >>> more costly than printing it? >> It doesn't speed up the processing inside JtR; it actually adds extra >> processing. >> >>> Concretely, is --markov --stdout --external=Parallel with node 1/100, >>> 100 times faster than with node 1/1? >> No. It's probably roughly same speed: the external mode adds overhead >> internally to JtR, but then those skipped candidates don't need to be >> printed to the Unix pipe. >> >>>> However, note that "every 10th" doesn't necessarily produce a >>>> representative sample: the underlying cracking mode (in this case, >>>> Markov) might happen to have some periodicity in its output, and one of >>>> its period lengths might just happen to be a multiple of 10 or whatever. >>>> So ideally you'd want to randomize the order (if the order somehow >>>> doesn't matter for your research) over a larger number of candidate >>>> passwords - say, pass a million of them through GNU coreutils' shuf(1) - >>>> and then take every 10th out of that randomized list. >>> My issue is that I can't get the whole output because it is too costly >>> for me to gather them due to UNIX pipe. I would like to my >>> >>> john --stdout --markov --sample=100 | my_sublime_post-process >>> >>> be somewhat 100 times faster than >>> >>> john --stdout --markov --sample=1 | my_sublime_post-process >> You could use the built-in --node=1/100 feature, which probably will >> speed things up a lot, but then it almost certainly doesn't result in a >> representative sample - it's just a way to split the work between >> multiple nodes, without regard as to whether each node would get a >> representative sample and be expected to crack a similar percentage of >> real-world passwords that other nodes crack or not (so this probably >> won't be the case, making this approach unsuitable for use in research). >> >> The same applies to incremental mode. >> >>> Your solution requires to get the whole output of john and then >>> post-process it, but I can't find a satisfiable way to get its whole >>> output (since john is really fast to generate candidates). >> A question is whether you actually need to get this many candidates (or >> a sample from this many), or whether fewer would suffice. That depends >> on what your ultimate goal is. >> >> Alexander -- Matlink - Sysadmin matlink.fr Sortez couverts, chiffrez vos mails : https://café-vie-privée.fr/ XMPP/Jabber : matlink@...link.fr Clé publique PGP : 0x186BB3CA Empreinte Off-the-record : 572174BF 6983EA74 91417CA7 705ED899 DE9D05B2
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.