Date: Mon, 01 Aug 2011 23:15:26 +0200 From: magnum <rawsmooth@...dband.net> To: john-dev@...ts.openwall.com Subject: Re: Character casing question for U+0131 On 2011-08-01 00:03, jfoug wrote: > This is asked to the list as a whole, in hopes that someone will have > the answer. I am hoping this can be easily answered, and then all > formats can conform to a single method. I'm afraid there is no one-size-fits-all answer. We need to establish the behaviour for every format that do uppercase and do not use Unicode internally. Fortunately the number of formats that uppercase is low! > Ok, in cp850, character at 0xD5 is U+0131. This is the undotted lower > case ‘I’ character. Now, in Unicode, this character DOES upcase. It > upcases to normal cap I. Thus, this is a non-circular upcase. In cp850, > lc(uc(char(0xD5))) != char(0xD5), but instead == char(0x69) > > If this is the proper behavior, then what I have right now, in Unicode.c > and rules.c is correct. I handle cp850 differently, as after the normal > building of the upcase/downcase set of arrays, I change one element in > the upcase array, to handle character 0xD5 upcasing into character 0x49. > This works great, AS LONG AS the actual formats, and or OS code page > logic works that way (for cp850). Empirical data is what gets us forward: Here are *real* hashes from a Windows XP running with OEM codepage 437. pound:1009:ED731A96A0C79241AAD3B435B51404EE:E1AE1BF327FBCC23730F7DB73A56AC44::: dotless-i:1010:F7E62F36F8DB5AE6AAD3B435B51404EE:5CF982AC5D8263F6F42A88C1816218C4::: german-ss:1011:83DC881CE3412BC5AAD3B435B51404EE:D0EE6EDA1C675ED9196A449872AEEA84::: micro:1012:866B72239BB4C2CBAAD3B435B51404EE:0F6CE4C114FB6047318D15A2F0EBBFAC::: o-diaeresis:1013:350AACEB37EDB148AAD3B435B51404EE:F8B057EF7946389887E5C5868A0969B1::: Now, you can crack the NT hashes using -enc=utf8 and the following dictionary (encoded as UTF-8 of course): £ ı ß µ ö Notes for NT: 1. The german double-s (ß) is NOT uppercased to SS (just as we thought) 2. The micro sign is NOT uppercased to the greek uppercase version of that character (Unicode specs suggest that could be done) And here is the characters that will crack the LM part (if encoded in cp437): £ I ß µ Ö Notes for LM: 1. The dotless i is not present in cp437! *Regardless* of that, it was uppercased to I (which of course do exist). I know that when using a euro sign as password, there will be a "empty" LM hash and that was what I expected here too. Very interesting. 2. The german double-s (ß) is NOT uppercased to SS (just as we thought) 3. The german/nordic o-with-dieresis (ö) is uppercased, as expected Conclusions: That dotless-i thing in LM was news to me. This means you do the right thing for cp850, but probably not for cp437... Other than that, the behaviour is what I thought. Something similar to this should be done for Oracle on Unix, Oracle on Windows, and a number of other formats. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.