
Date: Thu, 19 Jan 2012 23:34:12 +0400 From: Solar Designer <solar@...nwall.com> To: valentino.angeletti@...l.com Cc: osssecurity@...ts.openwall.com, bugtraq@...urityfocus.com, tytso@....edu Subject: Re: pwgen: nonuniform distribution of passwords On Thu, Jan 19, 2012 at 09:21:17AM +0100, valentino.angeletti@...l.com wrote: > may ask you what software (and how it works brute force ecc) you used? John the Ripper, indeed  generating a custom .chr file (which is based on trigraph frequencies) from a sample of 1 million of pwgen'ed passwords and then using this file to crack another (nonoverlapping) sample of pwgen'ed passwords. My initial notification to osssecurity and Bugtraq included these links, which describe this in more detail: http://www.openwall.com/lists/johnusers/2010/11/17/7 http://www.openwall.com/lists/johnusers/2010/11/22/5 http://www.openwall.com/lists/johnusers/2010/11/28/1 http://www.openwall.com/lists/johnusers/2010/12/06/1 However, as I wrote in a followup posting to osssecurity 2 days ago: "I might update/revise my analysis on this issue in a few days. Specifically, I now suspect that a (large) part of the apparent nonuniformity of the distribution was in fact an artifact of my analysis approach. I only analyzed sets of 1 million of pwgen'ed passwords, so I could not directly check the distribution of full passwords (1 million is too little, even compared to the small keyspace of these passwords), whereas JtR only uses trigraph frequencies. I am now generating 1 billion of pwgen'ed passwords, which should take a couple of days to complete. [...]" http://www.openwall.com/lists/osssecurity/2012/01/17/14 This has in fact completed by now: $ ./pwgen 1cn 8 1000000000  dd obs=10M > 1g 17578125+0 records in 858+1 records out 9000000000 bytes (9.0 GB) copied, 147496 seconds, 61.0 kB/s And I analyzed this larger sample briefly: $ time ~/john/john1.7.9jumbo5/run/unique v mem=25 1gu < 1g Total lines read 1000000000 Unique lines written 697066573 real 144m40.619s user 142m8.727s sys 0m39.645s So that's 697 million unique passwords in 1 billion, which for a uniform distribution would correspond to a total keyspace size of 1.3 billion: $ ./solve 697066573 1000000000 1296935185 I've attached the solve.c program to this message. [ BTW, I verified that there's no fatal precision loss in its expected_different() function (despite of the risky expression) for the value ranges on which it is called here. I did so by also computing the expected different value with a different (much slower) algorithm  just not as part of equation solving (which would be slower yet). ] However, let's see what numbers we get for smaller samples (actually, subsets of the 1 billion sample above, but that's OK in this case): Total lines read 100000000 Unique lines written 89163247 Total lines read 10000000 Unique lines written 9811335 Total lines read 1000000 Unique lines written 997978 $ ./solve 89163247 100000000 427419891 $ ./solve 9811335 10000000 261676022 $ ./solve 997978 1000000 246946702 As we can see, the guess for the total keyspace size keeps increasing as we increase the sample size. That's under assumption that we have a uniform distribution. Hence, our distribution is nonuniform. That said, the keyspace may in fact be smaller than I had expected, although I haven't hit it with my 1 billion sample yet. So we have a mix of two problems here: likely small keyspace and nonuniform distribution. My John the Ripper pwgen.chr attack was probably testing a lot of passwords that are actually impossible, so a much faster attack (even more specifically focused on pwgen'ed passwords) should be possible. I think I underestimated just how much smaller pwgen's pronounceable passwords keyspace is compared to the full {62 different, length 8} keyspace, although we still do not have the exact number. I continue to think that the primary problem in terms of pwgen use is that these passwords look much stronger than they actually are. For example: $ pwgen athu9Bee Vae0jexa rae2Oa1c Aim8Ku3c No5aep0F OhY5quee ieVae2ti wah1aiM2 oaNg1oth baePule5 sod8oH6i ohfoh5Du Pai9Uch7 AeG3bies Maev6tae iKievae9 zo9eiSai Xito9aid iGh3ay8s owib0Ub8 Yahm0oaC Wu3VaiK7 IeK3sah2 xai7Eico ... Looking at these, how many people would realize that the keyspace for them may be thousands of times smaller than the full {62 different, length 8} keyspace and that the distribution may be nonuniform? Based on the 1 billion sample, the keyspace is 168,350 times smaller, although this estimate has the nonuniformity "factored in" (a larger sample would show a somewhat larger keyspace estimate). A partial fix may be for pwgen to print a warning each time it is used in this mode and with output to a tty (it already behaves differently based on whether its output is a tty or not, so that won't be a new drawback). Also, the default mode may be changed to the "secure" one, with the weak alternative available via a nondefault option. <plug> Or indeed people can just use pwqgen instead: http://www.openwall.com/passwdqc/ $ for n in {1..10}; do pwqgen; done Warm5Claw4Blame hungry5tomato3Yeah Midst_Vowel9Spate Ohio7steak$Mild Taxi&desert+gorge fondPint=easy mode6oldest5chief Defeat7OvalAnew vomit+ate2Slid tehran8hang3ritual These are 47bit (similar to the full {62 different, length 8} keyspace, but at a longer length and easier to memorize). This can easily be adjusted from the commandline: $ for n in {1..3}; do pwqgen random=30; done spend!deep Alkaliself decay9your $ for n in {1..3}; do pwqgen random=64; done meetdraft9Gun*wire inner+Rusty4dogma3tape Switch8Sword9even=Viral The 30bit ones above are comparable to pwgen's in security (about as weak). In fact, they may be slightly better due to uniform distribution (as long as /dev/urandom works well). The 64bit ones are unreasonable for most users/uses. The default of 47 bits is reasonable, although as always this depends on threat model. </plug> Alexander #include <math.h> #include <stdio.h> #include <stdlib.h> static double expected_different(double keyspace, double subset) { return keyspace * (1.0  pow((keyspace  1.0) / keyspace, subset)); } int main(int argc, char **argv) { double unique, total, keyspace, min, max; if (argc != 3) return 1; unique = atof(argv[1]); total = atof(argv[2]); min = 0; max = pow(62, 8); do { keyspace = (min + max) * 0.5; if (expected_different(keyspace, total) > unique) max = keyspace; else min = keyspace; } while (max  min > 0.5); printf("%.0f\n", keyspace); return 0; }
Powered by blists  more mailing lists
Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.