Date: Mon, 29 Jun 2015 11:09:02 -0400 From: Matt Weir <cweir@...edu> To: john-dev@...ts.openwall.com Subject: Using Probabilistic Context Free Grammars (Was precomputed attacks) >> I see, there is UnLock and it generates better wordlists than john in >> some cases. Is it Free Software? Why is not it integrated into john? I decided to fork this thread to keep things from getting to mixed up between PCFGs and pre-computed attacks. I'm not a huge fan of the name UnLock so I'll just refer to the basic idea as using Probabilistic Context Free Grammars, (PCFG), instead. The version I worked on is free software, (GPLv2) and it does use bits and pieces from JtR. There is another version that is a re-write that students and staff at Florida State University are still developing but I don't have the source code for that. I have a very poorly maintained github page for it available at: https://github.com/lakiw/pcfg_cracker/ but I basically was using that for my own development efforts and the code is in a very unstable state. A more stable, but older version of it is available at: https://sites.google.com/site/reusablesec/Home/password-cracking-tools/probablistic_cracker I never bothered to directly integrate it into JtR simply because the -stdin option worked well enough for my needs. Direct integration into JtR would be nice but I don't think the PCFG cracker is mature enough for that to make sense right now. Aka I'd like to limit the amount of buggy software in JtR ;p Ok, I probably should take a step back and give a high level overview of using PCFGs to model human password creation strategies. The easiest way to think about it is to think of the PCFG attack as a model. Now we currently do a lot of models in JtR and many of them are wicked advanced. Wordlists model password creation strategies using dictionaries and mangling rules. Markov and Incremental mode model password creation strategies by using the conditional probability of individual characters to generate guesses. PRINCE models passwords by combining words from a dictionary together. PCFGs are just another way to model users. I started using PCFGs when I was trying to make an automated rule generator that I could train on previously disclosed passwords. My first attempts were very much like what Bartavelle has done below though his code is way better then what I did: https://github.com/bartavelle/rulesfinder Unfortunately that approach had some limitations and I ran into issues when trying to combine different mangling techniques and optimizations. Aka if you don't see "append 4 digits" + "append "$" together in your training set should you still try that? Now one solution is to make it more generic and modify the rule to be "append 4 digits" + "append 1 special char", but then you loose a lot of the potential optimizations since some special characters are way more common than others. The same goes for digits where you might want to try years, (or series like '1234'), before you try generic numbers. In addition, I really wanted to add support for multiple input dictionaries since the current setup of having to run multiple cracking sessions with different input dictionaries, (small targeted ones and then bigger ones), with different rulesets is annoying. For example some words are way, way more common than others, (for example "password"), so you might want to try advanced mangling rules on them before you try a basic mangling rule on a less common word. But at the same time, you might want to try a basic mangling rule on a less common password, such as 'chair123', before you try some crazy mangling rules such as 'PasSword423&'. Balancing that in the current one dictionary setup is difficult. This lead me and my group to start using PCFGs. Breaking down that term, basically we assign everything probabilities. We assign a probability to the length of a password, a probability to individual dictionary words, a probability to individual digits, a probability that digits appear at the end vs the beginning, a probability to the capitalizations used, etc. That's the P in PCFGs. My group then designed a CFG based on those probabilities. We made it context free since we assumed in *most* cases that the different transitions were independent of each other. Aka the digit at the end of the password does not have anything to do with the base dictionary word used or the capitalization chosen. Now there are exceptions and there's a couple of things we did to sneak context specific info, (such as password complexity requirements), into our grammar but that can wait for a later discussion. By using a PCFG it makes training easier since thanks to it being context free if we see a date such as '2015' show up anywhere in the training dataset it gets a probability vs only detecting it in the context of a specific mangling rule. Also you can modify the grammar later to perform target specific modifications, (for example you can raise the probability of dates and words that hold significance to the target such as birthdays and names of pets). The current version of my PCFG cracker outputs guesses to stdout for use of a tool like JtR. Another option though could be for it to create very fine grained mangling rules to be used in a traditional wordlist style attack. The only downside then is you would loose the ability to have multiple input dictionaries of different probabilities. Now for the downsides. The overhead is pretty high so against fast hashes the increase in precision isn't worth the slower guessing rate. Also the grammar is imprecise enough that against really slow hashes hand written mangling rules still are the way to go. So it currently performs best against medium speed hashes. Long story short, there is a lot of work that needs to be done if you really want to make this useful outside of an academic setting for cracking passwords. If anyone is interested I think it's worth mentioning that the actual grammar itself has some huge areas that I suspect could be hugely improved. I could talk about that in more detail but the grammar my tool currently uses is pretty naive and was more of a proof of concept vs a well designed model of human behavior. I'm sure there are better ways to model password creation strategies using PCFGs. As a side note, right now most of the focus with PCFGs has been on the defensive side, using them to generate fake passwords, (honeywords), or evaluate the strength of passwords as part of a password strength checker. http://dl.acm.org/citation.cfm?id=2420966 Matt Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.