Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 12 Aug 2011 22:05:39 +0200
From: magnum <rawsmooth@...dband.net>
To: john-dev@...ts.openwall.com
Subject: Re: Unicode, casing, obtaining data, and some real-world
 MSSQL (2000) data.

On 2011-08-12 20:34, jfoug wrote:
> Well, getting that 100% workable, and being able to do things like properly
> collate things such as "MASSE" and "Maße" is not the real ‘purpose’ we need
> in john.

This depends on where/why we uppercase. If we uppercase within a format, 
like LM or MSSQL, we should obviously uppercase just like the native 
format would (maße -> MAßE in all cases we've seen). But if I feed a 
German lowercase wordlist to john, attacking a case-significant format 
and using rules for permutations I would want maße -> MASSE because that 
is how a German would likely write it.

> What I found here, is several things. First, if the _wsetlocale() was not
> called, then the only upcasing/lowcasing was A..Z<->  a..z  Then, if
> _wsetlocale() was called (with a valid locale), then the exact same casing
> was happening, NO MATTER WHAT locale is used.  Remember, we are in Unicode,
> so the OS simply turns on the above 0x7F casing rules, but they are the same
> for the OS.

Are you saying that if you set a locale it would go from just a-z to 
complete Unicode - BUT using the system locale instead of the one you 
specified? That weird, kinda defeats the whole purpose of wsetlocale().

> Thus, when I do release this, it will likely be an initial release, and need
> some work tweaking it.  Also, I had some problems with magnums recent UTF-32
> changes.  I need to work through some of that with him, as I do not fully
> understand all of that code.

Do you mean the reinstated "third case" in utf8towcs()? It does not 
convert to UTF-32 but to UTF-16 with surrogate pairs. I expect Windows 
UTF-16 hashes to be just like that but I haven't confirmed it with 
empirical data. I tested it against Perl (pass_gen.pl).

At some point we will need conversions to UTF-32 (no surrogate pairs) 
too but I won't touch that until I see a format that hashes UTF-32.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.