Date: Wed, 21 Oct 2015 21:50:25 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: locale On 2015-10-21 20:07, Solar Designer wrote: > On Wed, Oct 21, 2015 at 09:21:27AM +0200, magnum wrote: >> I'm now checking/setting locale (if autoconf says I can) and fall back >> to skipping the degree sign. Let me know if it misbehaves. > > Looking at these changes, I realize that my idea was probably bad: > initializing the locale with setlocale() affects lots of things, > including the ctype macros. With some cracking modes, this might affect > what candidate passwords they generate. IIRC, we avoided using the > ctype macros in our wordlist rules engine, but now that I grep e.g. for > "islower", I find uses in dynamic_compiler.c, jumbo.c, mask.c. I wasn't aware of these uses and we should replace them. Actually, the one in mask.c is kind of correct: It's for case-toggling the base word in hybrid mode, and just being able to do so with ASCII is a limitation. But we must honor our encoding options, not the terminal locale. > While we might later choose to add initializing locale to JtR for other > reasons, I think DEGREE_SIGN alone isn't a sufficient reason, and if we > do add locale support, we should make it consistent: initialize it all > the time and do so early on, and not only do it for OpenCL and CUDA > formats like the current code does. I agree that introducing a locale for the degree sign alone is overkill. I was just moving slowly: I actually had some vague idea that the arguable UTF-8 defaults (just the parts that affect screen output, in particular the "AlwaysReportUTF8 = Y") could be made depending on locale. But maybe we should back away from setlocale instead, at least for now. > For now, maybe we should in fact check env vars explicitly to decide on > DEGREE_SIGN. > > A maybe acceptable hack (for jumbo) is to do something like: > > setlocale(LC_ALL, ""); > ... check for UTF-8 here ... > setlocale(LC_CTYPE, "C"); > > so that ctype macros are unaffected by the current locale (since our > uses of them appear to be of the kind where we prefer consistency over > customization; arguably, this means they are misuses). But we'll need > to do it all the time, and early on, to ensure consistent behavior > regardless of whether an OpenCL or CUDA format is run. > > Also, the current checks for strchr(setlocale(LC_ALL, NULL), '.') do not > tell us whether the locale is UTF-8 or not. We'll need to do better. The current implementation is not limited to UTF-8, it will also get you a proper degree sign for legacy codepages like ISO-8859-*, CP* or KOI8-R. For this to work I can't reset it back to C, and checking for UTF-8 is irrelevant (the current check for '.' is mostly a check for 'neither "C" nor "POSIX" but some complete "aa_BB.CCCC" setting'). Anyway, you point out potential problems I did not realize. I think I'll just drop the use of setlocale for now but I'll sleep on it. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.